LiteLLM: One API for Every LLM
The moment you build with more than one LLM provider, the friction shows up: OpenAI, Anthropic, Gemini, Bedrock and Azure each have their own SDK, auth pattern, request shape, and error types. Wire up three of them and a meaningful chunk of your code is just adapter glue.
LiteLLM (by BerriAI, YC W23) removes that glue. It’s an open-source AI gateway that exposes a single, OpenAI-format interface to 100+ LLM providers — so you write your call once and swap models by changing a string. This post covers what it is, how it works, two concrete use cases, how it compares to the alternatives, what its performance really looks like, and the honest pros and cons.
What it is
LiteLLM is an open-source library and gateway that gives you one unified interface to call 100+ LLM providers using the OpenAI request/response format. It comes in two forms that share the same core: a Python SDK you drop into your code, and an AI Gateway (Proxy Server) you deploy as a centralized service for a team or organization.
The core idea is the adapter pattern applied to LLMs: you speak “OpenAI” to LiteLLM, and LiteLLM speaks each provider’s native dialect on your behalf.
Why it matters
The benefits stack up the moment you go beyond a single model:
- Unified API. One interface for 100+ models — no provider-specific SDK juggling, no rewriting calls per vendor.
- Drop-in OpenAI compatibility. If your code already speaks the OpenAI API, pointing it at LiteLLM is a base-URL change.
- Trivial model swaps and fallbacks. Change
openai/gpt-4otoanthropic/claude-sonnet-4in a string; configure automatic retry/fallback across deployments so an outage on one provider fails over to another. - A production gateway, not just a library. The proxy adds virtual keys, per-team/project spend tracking and budgets, guardrails, caching, load balancing, and an admin dashboard out of the box.
- Observability built in. Callbacks to Langfuse, MLflow, Lunary and others, so every call is logged where you already look.
- Open source and auditable. You can read exactly how it handles keys and data, and self-host with no prompts leaving your network — often the only option for air-gapped or regulated setups (TrueFoundry review). Adopters include Stripe, Netflix and Google’s ADK.
How it works
The SDK is a translation layer: you call completion(model=..., messages=[...]), LiteLLM maps the OpenAI-shaped request to the target provider’s API, makes the call, and maps the response (and errors) back to the OpenAI shape. Same code, any model.
The proxy wraps that translation layer in a long-running service and adds the things teams need: authentication via virtual keys (per user/team/project), routing with retries and fallbacks, budgets and rate limits, guardrails, caching, and logging. In production it leans on Redis (caching, rate-limit counters) and Postgres (virtual keys, spend logs).
Getting started
As a library it’s one dependency and one function:
from litellm import completion
import os
os.environ["ANTHROPIC_API_KEY"] = "..."
resp = completion(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "Hello!"}],
)
As a gateway, it’s a command and an OpenAI client pointed at it:
uv tool install 'litellm[proxy]'
litellm --model gpt-4o # starts the proxy on :4000
import openai
client = openai.OpenAI(api_key="sk-litellm-key", base_url="http://0.0.0.0:4000")
resp = client.chat.completions.create(model="gpt-4o", messages=[{"role":"user","content":"Hi"}])
For real deployments use the -stable Docker images (load-tested before release) and verify their cosign signatures — more on why that matters below.
In practice
Two use cases cover most of why people reach for it.
1. Provider-agnostic app with fallback
You’re building an app and don’t want to be locked to one vendor — or one vendor’s uptime. With the SDK (or the proxy’s router) you define a primary and fallbacks: calls go to gpt-4o, and if OpenAI errors or rate-limits, LiteLLM automatically retries on claude-sonnet-4 or an Azure deployment. Your application code never changes; the routing config does. This is the cheapest insurance against a single provider having a bad day, and it makes “let’s A/B a cheaper model” a one-line experiment.
2. Central AI gateway for a team
A platform team stands up the LiteLLM proxy as the single doorway to every LLM the company uses. Each team gets a virtual key instead of raw provider keys; the gateway tracks spend per key, enforces per-project budgets, applies guardrails, caches repeated calls, and logs everything to one place. Developers keep using the plain OpenAI client; the org gets centralized cost control, access management, and observability without touching app code. This is the “Gen AI enablement” pattern LiteLLM is explicitly built for.
How it compares
The usual comparison is OpenRouter (a hosted model marketplace) and managed gateways. The clean split: LiteLLM is the self-hosted, open-source option with maximum provider breadth and full control; OpenRouter and managed services trade some control for zero ops (Suhas Bhairav, DEV comparison).
| Dimension | LiteLLM (self-hosted) | OpenRouter / managed gateway |
|---|---|---|
| Hosting | you run it (SDK or proxy) | vendor-hosted |
| Data path | stays in your network | through the vendor |
| Provider breadth | 100+, you add keys | broad, vendor-managed |
| Control / auditability | full (open source) | limited |
| Ops burden | you own Redis + Postgres + upgrades | none |
| Best when | regulated, air-gapped, high volume | you want it working today, low volume |
On cost, third-party analysis puts the crossover around volume: above ~50M requests/month, self-hosted LiteLLM’s TCO drops below managed vendors; below ~5M requests/month, managed is often cheaper once you price in the labor (TrueFoundry).
Performance and benchmarks
Be careful here, because the numbers depend heavily on how it’s deployed. LiteLLM’s own benchmark advertises 8ms P95 added latency at 1,000 RPS (docs). Independent load testing tells a more cautious story: under heavy concurrency reviewers have seen P99 latency spike and, in one test, the proxy run out of memory and start failing requests around 1k RPS — attributed to Python’s GIL and to database logging slowing requests as log volume grows (TrueFoundry).
Both can be true: the vendor figure reflects a tuned setup, the critical figure a stressed one. The honest takeaway is that LiteLLM performs well when properly scaled (multiple workers, Redis, async logging) but is not a zero-config drop-in at very high throughput — capacity-plan and load-test for your traffic.
Tradeoffs
The honest cons:
- It’s real infrastructure. The proxy needs Redis and Postgres, migrations, backups, and upgrades. That’s operational weight you don’t have with a hosted gateway.
- Concurrency ceiling. Python’s GIL and synchronous DB logging can bottleneck a single instance under heavy load; high throughput means horizontal scaling and tuning.
- Supply-chain risk is real. A March 2026 compromise published malicious releases (v1.82.7 and v1.82.8); the mitigation is to pin versions, scan dependencies, verify the signed Docker images, and isolate the proxy (TrueFoundry). Worth knowing for any dependency, but called out because a gateway holds your keys.
- Abstraction leaks. A unified interface can’t perfectly expose every provider-specific feature; the newest or most exotic parameters sometimes lag.
- Some features are enterprise. SSO, advanced security and dedicated support sit behind the commercial license.
Takeaway
LiteLLM is the default answer to “I need to call several LLMs without writing several integrations.” As a library it makes your code provider-agnostic with fallbacks for almost free; as a self-hosted gateway it gives a team one auditable control point for keys, spend, and guardrails. Reach for it when you value control, breadth, and keeping data in your network — and you have (or are) the platform muscle to run it. If you’d rather not operate Redis, Postgres and a scaled proxy, a hosted gateway is the lower-effort path until your volume makes self-hosting pay off.