AI Development

Build Your Agent's Tools Like Real Services

The tool servers you expose to an AI agent are production services, not scripts — they need real auth, 12-factor config, typed errors, rate limiting, and observability. The "AI" part is the least interesting thing about them, and treating them like toys is how they fail in production.

April 1, 2026

AI Development
MCP
Architecture
APIs

It’s tempting to treat the tool servers you hand an AI agent as throwaway glue — a quick script that wraps an API so the model can call it. Then I looked closely at how a well-built one was actually constructed, and it was a textbook production web service: real per-request authentication, environment-only configuration, typed errors, deliberate rate limiting, and full observability. The “it’s for an AI agent” part was almost incidental. That reframed how I think about these things: a tool server for an agent is a production service that happens to speak an agent protocol, and it should be built to the same standard as any other service in your stack.

The protocol is the only novel part

An agent tool server — an MCP server, in current terms — exposes a set of capabilities the model can invoke. Strip away the protocol and what’s left is exactly a normal API service: it receives requests, authenticates them, does work against some backend, and returns structured results. The agent-facing protocol is a thin layer on top. So the engineering that makes it good or bad is the same engineering that makes any service good or bad — and “it’s an AI tool” is no excuse to skip the parts you’d never skip on a customer-facing API.

An agent tool server is 5% novel protocol and 95% ordinary web service. Build the 95% like you mean it.

Authenticate every request, carry identity through

The well-built server I studied didn’t run as one shared identity. It validated a caller’s credential on every request and carried that identity through the call so actions executed as that caller, with their permissions, against the backend. That matters enormously for agent tools, because an agent acting on behalf of different users must not become a confused deputy that does everything as one all-powerful service account. Per-request auth and per-request identity aren’t optional niceties here — they’re the difference between a tool that respects your existing access controls and one that quietly bypasses them for anyone who can reach the agent. (It’s the same hazard as a shared service account becoming a single point of contention and a blind audit trail.)

Config from the environment, secrets never in code

It was strictly 12-factor: all configuration came from environment variables, nothing was hardcoded, and there was a clean separation between code and the deployment’s settings. This is table stakes for a real service and somehow the first thing “quick AI tool” scripts abandon — endpoints and tokens baked into source, one build per environment, secrets one screenshot away from leaking. An agent tool server touches real systems with real credentials; it deserves the same config hygiene as anything else that does, so the same image runs in dev and prod and the secrets live outside the artifact.

Typed errors, rate limits, and a backend that can say no

Three more production behaviors the good server had, that toy versions skip:

Typed, structured errors — so the agent (and the humans debugging it) get a meaningful failure, not a raw stack trace or a vague 500. An agent handles “not found” differently from “rate limited”; it can only do that if you tell it which is which.
Rate limiting / pacing toward the backend — the server deliberately spaced its calls to the system behind it. An agent can generate requests faster and weirder than any human, so the tool layer is exactly where you protect the backend from a model in a retry loop.
Input validation at the boundary — the model’s tool calls are untrusted input like any other request, and they get validated before they touch anything real.

None of this is AI-specific. All of it is what keeps a service from falling over — and an agent will find the missing guardrail faster than your users would.

Observability, because an agent will surprise you

Finally, it was instrumented like a real service: structured logs and metrics on request rates, latencies, and errors. This is non-negotiable for agent tools specifically, because agents use tools in unpredictable, emergent ways — you cannot guess in advance how the model will call them, so you need to be able to see how it actually did, after the fact. Without metrics and logs, a misbehaving agent is an invisible problem; with them, it’s a dashboard you can read. Treat the tool server’s observability as part of how you’ll understand the agent’s behavior, not just the server’s.

The standard doesn’t drop because it’s “AI”

The throughline: don’t let “it’s an AI tool” lower the bar. The tool servers an agent calls reach into real systems with real credentials and get exercised harder and stranger than human clients ever would — which argues for more rigor, not less. Per- request auth, environment config, typed errors, rate limiting, observability, and a gateway in front to centralize policy. It’s the same instinct as putting a gateway in front of your LLMs and governing AI tools like production access — the agent is new, but the engineering discipline is the one you already have. If you’re building MCP servers or agent tools and treating them as the real services they are, I’d like to compare notes.