LLM Fallback Chains: Production Patterns
Automatic fallback between LLM providers on failure
LLM Fallback Chains: Production Patterns (2026)
In production, model providers fail — rate limits, timeouts, regional outages, the occasional 500. A fallback chain keeps your app up by automatically retrying the request against an alternate model or provider when the primary fails. It's the single most important reliability pattern for LLM apps.
The pattern
Define an ordered list of models. Try the first; on failure (error or timeout), fall through to the next. Combine with per-attempt timeouts and capped retries so a slow provider can't hang the request.
python
pip install litellm
from litellm import completionresp = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this ticket."}],
fallbacks=["claude-3-5-sonnet-latest", "gpt-4o-mini"],
timeout=20,
num_retries=2,
)
print(resp.choices[0].message.content)
LiteLLM gives you provider-agnostic fallbacks for free behind one OpenAI-compatible call — compare gateways in LiteLLM vs Portkey.
Design choices
Beyond fallback: load balancing
When you have multiple keys/regions, also load-balance across healthy endpoints to spread rate limits and reduce latency — the complementary pattern to fallback. Pair this with retries, circuit breakers, and observability for a robust stack.
FAQ
What should I retry on? Transient errors: 429, 5xx, timeouts. Never retry 400-class errors. Won't fallbacks hide problems? Log every fallback event — a rising fallback rate is an early warning, not something to silently swallow. Same prompt across models? Keep prompts portable; test the fallback model so quality doesn't crater on failover. Library or roll my own? LiteLLM/Portkey give you this out of the box; rolling your own is fine but reimplements the same logic.
Summary
A fallback chain is cheap insurance: an ordered list of models across providers, per-attempt timeouts, retries only on transient errors, and logging on every failover. Add load balancing across healthy endpoints and your LLM app survives the outages that will inevitably happen.
*Last updated: June 2026. Verify APIs against the LiteLLM docs.*
Also available in 中文.