AI Development

Put a Gateway in Front of Your LLMs

Routing every team's LLM calls through one internal gateway instead of letting each app hit providers directly buys central auth, cost control, model flexibility, and an audit trail — at the price of a chokepoint you have to operate well.

May 14, 2026

AI Development
LLM
Architecture
Operations

Diagram of several internal apps routing through one central LLM gateway — auth, rate limits, audit logs, cost controls, and model routing — before reaching multiple model providers and a fallback.

The moment more than a couple of applications in an organization start calling large language models, the interesting question stops being “which model” and becomes “how does everyone reach models.” Let each app carry its own provider keys and call out directly and you get a sprawl of credentials, no cost visibility, and a dozen slightly different integrations to maintain. The pattern that keeps it sane is boring and old: put a gateway in front.

What a gateway actually is here

It’s a proxy that speaks a common, OpenAI-compatible API and forwards requests to whatever’s behind it — a hosted OpenAI deployment, a cloud provider’s models, embeddings endpoints, maybe a model-context endpoint for tools. Applications point at one base URL with one key style; the gateway decides where the request actually goes. Think of it as a reverse proxy, except the upstreams are model providers instead of web servers.

That single indirection is doing more work than it looks like.

What it buys you

Central auth and key management. One place issues, rotates, and revokes keys, instead of every app independently holding provider credentials you can’t easily account for.
Cost control and visibility. Per-team budgets and usage logging live in one spot, and provider-level savings — batch pricing, cached-input discounts — get applied centrally instead of rediscovered app by app.
Model flexibility behind a stable interface. Swapping or adding a model becomes a gateway config change, not a rewrite in every consumer. Apps that targeted “the gateway” don’t care that the model behind it changed.
Governance and audit. Every call flows through one chokepoint you can log, rate-limit, and inspect for how data is being handled. You can’t police traffic that never passes through you.

The costs, stated honestly

A gateway is not free architecture, and pretending otherwise is how you build one that everybody resents.

It’s a chokepoint. An outage or an over-tightened policy hits every consumer at once. The day you centralize access is the day the gateway becomes production infrastructure, whether you treat it that way or not.
Compatibility sprawl. It’s tempting to add a new endpoint shape for every team’s preferred SDK. Resist it. Support one API style well; maintaining a museum of half-compatible endpoints turns the gateway into the problem it was supposed to solve.

The failure mode that quietly kills it

Here’s the one I’d most want a past version of me to know. If the gateway only accepts traffic from certain networks — corporate egress, an allow-list of IPs — then workloads running somewhere else, like your cloud environments, get 403’d even when the network path is technically fine. The request leaves the cluster, reaches the gateway, and bounces because its source address wasn’t on the list.

What happens next is the real damage: the blocked teams, who have work to ship, start calling the model providers directly to get around the gateway. And every call that routes around the gateway is a call you don’t authenticate, don’t meter, don’t log, and don’t govern.

A gateway only governs the traffic that goes through it. The traffic that routes around it is the traffic you most needed to see.

So when you find teams bypassing the gateway, read it as a signal, not a discipline problem: the gateway isn’t meeting them where their code actually runs. The fix is to make the places that run real workloads able to reach it — not to send a sterner email. A gateway that’s reachable from the corporate office but not from the cluster everything deploys to is a gateway designed to be bypassed.

Centralizing keys only works if apps stop hardcoding them

One bonus the gateway promises is getting provider credentials out of individual codebases. It only delivers if apps actually reference their gateway key from the environment or a secrets store rather than pasting it into a config file or a notebook “just for now.” A central key authority with a dozen copies of the key committed across repos is the worst of both worlds. (More on that in keeping secrets out of git.)

So: build one, past a certain size

For one app, a gateway is overkill — call the provider and move on. Past a couple of consumers, the gateway earns its keep fast, but only if you operate it like the shared dependency it has become: reachable from where workloads run, monitored, and boring. Build it as infrastructure, not as a feature, and it disappears into the background the way good infrastructure should. If you’re standing one up and want to compare design notes, I’m easy to reach.