Releases: tensorzero/tensorzero
2026.3.4
Warning
Planned Deprecations
- The configuration for inference evaluations should be nested under the relevant functions moving forward [docs]. You can run evaluations by providing a function name and a list of evaluators. The legacy format will be removed in a future release.
[functions.write_haiku.evaluators.exact_match] type = "exact_match" - The legacy implementation of GEPA (
launch_optimizationwithGEPAConfig) will be removed in a future release. Please uset0.optimization.gepa.launchinstead. [docs]
Bug Fixes
- Fixed a UI bug where a custom gateway
base_pathwas not handled correctly in certain routes. (thanks @wangfenjin!)
New Features
- Started including embeddings requests in the Prometheus metrics
tensorzero_requests_totalandtensorzero_inferences_total. - Added the configuration field
observability.batch_writes.write_queue_capacityto enable backpressure for observability data in the gateway.
& multiple under-the-hood and UI improvements (thanks @majiayu000)!
2026.3.3
Bug Fixes
- Fixed two edge cases affecting batch inference.
- Fixed a UI bug affecting "Try with..." with inputs that include base64 files.
- Removed assistant message prefill for JSON functions + Anthropic (deprecated by Anthropic).
New Features
- Added an implementation of GEPA (automated prompt engineering) based on durable workflows.
- Allow users to specify duplicate tool calls in
all_oftool evaluators to evaluate parallel tool calling. - Allow users to specify an expiration date for API keys in the UI. (thanks @eibrahim95)
- Allow users to specify
object_storage.endpoint = "env::MY_ENV_VAR"in addition to static values. (thanks @Meredith2328)
& multiple under-the-hood and UI improvements (thanks @majiayu000)!
2026.3.2
Bug Fixes
- Fixed an UI issue that prevented certain pages from rendering when depending on historical configuration.
New Features
- Added Postgres as an alternative observability backend to ClickHouse. Postgres is the simplest way to get started; we recommend ClickHouse if you're handling >100 RPS.
- Added the
openrouter::xxxshort-hand for embedding models. - Added support for per-session API keys in the browser (instead of a global environment variable) when auth is enabled.
& multiple under-the-hood and UI improvements!
2026.3.1
Warning
Completed Deprecations
- Removed the deprecated
model_provider_namefilter forextra_bodyandextra_headers. Please usemodel_nameandprovider_nameinstead. - Removed the legacy experimental
list_inferencesendpoint and method. Please use the new endpoint instead. [docs] - Removed several long-deprecated types and methods from the TensorZero Python SDK.
Warning
Planned Deprecations
- The embedded gateway in the TensorZero Python SDK will be removed in a future release (2026.6+).
patch_openai_clientandbuild_embeddedare deprecated. Please deploy a standalone TensorZero Gateway instead (usage:base_urlfor OpenAI SDK;build_httpfor TensorZero SDK). - The variant configuration field
weightwill be removed in a future release (2026.6+). Please use the new experimentation configuration semantics. [docs]
Bug Fixes
- Fixed a compatibility bug with Valkey-based caching that only affected Redis.
New Features
- Added support for launching optimization workflows with
dataset_name(instead of an inference query) inlaunch_optimization_workflow.
& multiple under-the-hood and UI improvements!
2026.3.0
Warning
Completed Deprecations
- The deprecated Prometheus metric
tensorzero_inference_latency_overhead_seconds_histogramwas removed. Usetensorzero_inference_latency_overhead_secondsinstead.
Warning
Planned Deprecations
- The configuration for experimentation (e.g.
static_weights,track_and_stop) was simplified. The old notation will be removed in a future release. See Run adaptive A/B tests and Run static A/B tests for more information. - The evaluator configuration field
cutoffwill be removed in a future release. Instead, provide--cutoffs evaluator=value,...in the CLI. - The gateway route
/variant_sampling_probabilitieswill be removed in a future release. - The configuration field
postgres.enabledwill be removed in a future release. Instead, the gateway will consider whether the environment variableTENSORZERO_POSTGRES_URLis set.
New Features
- Add
regexandtool_useevaluators. [docs] - Add
experimental_launch_optimization_workflowto the TensorZero Python SDK.
& multiple under-the-hood and UI improvements!
2026.2.2
Caution
Breaking Changes
- The
--config-fileglobbing behavior has changed: single-level wildcards (*) no longer match files across directory boundaries. To match files across directory boundaries, use recursive wildcards (**). This aligns the behavior with standard glob semantics. For example:--config-file *.tomlmatchestensorzero.toml, but notsubdir/tensorzero.toml.--config-file **/*.tomlmatches bothtensorzero.tomlandsubdir/tensorzero.toml.
Warning
Completed Deprecations
- Removed deprecated legacy endpoints for dataset management. The functionality is fully covered by the new endpoints.
New Features
- Add cost tracking and cost-based rate limiting.
- Add namespaces: the ability to set up multiple granular experiments (A/B tests) for the same TensorZero function.
- Improve reasoning support for Anthropic (including adaptive thinking), Fireworks AI, SGLang, and Together AI.
- Allow users to whitelist automatic tool approvals for TensorZero Autopilot.
- Report provider errors when
include_raw_responseis enabled. - Add
include_aggregated_responseto streaming inferences. When enabled, the final chunk includes an aggregated outputaggregated_responsethat combines previous chunks. - Allow users to kill ongoing evaluation runs from UI.
- Allow custom gateway bind addresses with the environment variable
TENSORZERO_GATEWAY_BIND_ADDRESS.
& multiple under-the-hood and UI improvements (thanks @Nfemz @greg80303)!
2026.2.1
Caution
Breaking Changes
- The default value for
cache_options.enabledchanged fromwrite_onlytooff.
New Features
- Support reasoning models from Groq, Mistral, and vLLM.
- Support multi-turn reasoning with Gemini and OpenAI-compatible models.
- Support embedding models from Together AI.
- Add configurable
total_mstimeout to streaming inferences. - Display charts with top-k evaluation results in the TensorZero Autopilot UI.
- Add "Ask Autopilot" buttons throughout the UI.
- Allow TensorZero Autopilot to edit your local configuration files.
- Return
thoughtandunknowncontent blocks in the OpenAI-compatible endpoint (tensorzero_extra_content).
& multiple under-the-hood and UI improvements!
2026.2.0
Warning
Planned Deprecations
- Anthropic's structured output feature is out of beta, so the TensorZero configuration field
beta_structured_outputsis now ignored and deprecated. It'll be removed in a future release.
Bug Fixes
- Fix a regression in the
aws_bedrockprovider that affected long-term bearer API keys. - Fix a horizontal overflow issue for tool calls and results in the inference detail UI page.
New Features
- Add YOLO Mode for TensorZero Autopilot.
- Add interruption feature for TensorZero Autopilot sessions.
- Add summary to the TensorZero Autopilot session table in the UI.
& multiple under-the-hood and UI improvements (thanks @pratikbuilds)!
2026.1.8
Bug Fixes
- Fix a race condition in the TensorZero Autopilot UI that could disable the chat input.
- Increase timeouts for slow tool calls triggered by TensorZero Autopilot (e.g. evaluations).
& multiple under-the-hood and UI improvements!
2026.1.7
New Features
- [Preview] TensorZero Autopilot — an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more → Join the waitlist →
- Support multi-turn reasoning for xAI (
reasoning_contentonly).
& multiple under-the-hood and UI improvements!
