Engineering Practice
Validate Config When You Load It, Not When You Use It
Configuration that's only resolved at the moment it's used fails at the worst possible time — mid-request, in production. Resolve and validate config at startup so a bad value fails fast and loud instead of late and quiet.
- Engineering Practice
- Configuration
- Reliability
- Operations
I spent a while proving out how a system pulled a secret from its environment instead of hard-coding it, and the behavior that stuck with me wasn’t the mechanism — it was when the value got resolved. The config reference sat there inert as a plain string until the exact moment some code path evaluated it. Which meant a misconfigured value wouldn’t announce itself at startup. It would wait, quietly, until the unlucky request that finally touched that path — and then fail there, in production, far from any clue about why. That’s the case for a simple rule: resolve and validate config when you load it, not when you use it.
Lazy config fails at the worst time
Lazy resolution is seductive because it’s simple: store the config reference as-is, resolve it on demand. Nothing to do at startup, no upfront cost. The hidden price is when the failure surfaces. If a value is only checked the moment it’s used, a bad value is silent until something uses it — and “something uses it” might be a rare code path, an edge-case request, or a feature nobody exercises until a customer does.
So the system boots clean, passes a smoke test, looks healthy, and then falls over later in a context with no good diagnostics. The error appears deep inside request handling, pointing at the symptom instead of the cause — “this operation failed” rather than “your configuration was wrong the whole time.” You’ve taken a problem that could have been caught before the system accepted any traffic and deferred it into the middle of live operation.
Startup is the cheapest place to fail
The alternative is to do the work eagerly: at load time, resolve every config value, check that it’s present and well-formed, and refuse to start if something’s wrong. A failure here is the best kind of failure — it happens before the system is serving anyone, it’s attributable directly to the config, and the error can say exactly what’s missing or malformed.
A config error at startup is a build problem. The same config error at request time is an incident. Same bug, wildly different bill, and the only difference is when you chose to look.
Failing fast at startup converts a deferred, hard-to-diagnose production failure into an immediate, obvious one. The system either comes up correct or doesn’t come up at all, which is a far better contract than “comes up, works mostly, and detonates on the path you forgot.”
Resolve once, at the boundary
A practical detail from that exploration: when config needs transforming to be usable — pulled from the environment, decoded, parsed, type-converted — do that transformation once, at load, and hold the resolved result. Resolving the same reference repeatedly at each use is both wasteful and a correctness hazard, because now the resolution logic runs in many places and can behave differently depending on where it’s hit.
Resolve at the boundary, validate there, and let the rest of the system consume an already-clean value. This is the config version of validating input at the edge instead of defensively re-checking it everywhere downstream. One resolution site, one validation site, one place to get it right — instead of the same fragile parsing scattered across every consumer, each a chance to diverge.
Watch the gotchas eager loading exposes
Eager resolution is the right default, but it surfaces a couple of things worth knowing. Some config genuinely can’t be fully validated at startup — a downstream service might not be reachable yet — so the honest move is to validate everything you can at load (presence, format, type) and fail fast on those, while treating the genuinely runtime-dependent checks as their own explicit health checks rather than silent lazy failures.
And eager resolution can over-eagerly choke on a value that’s technically optional or has a fallback, so “validate at load” has to mean “validate what’s required, tolerate what’s genuinely optional” — not “reject anything that isn’t perfectly populated.” The goal is to fail fast on real misconfiguration, not to become brittle about absent-but-allowed values. Done with that nuance, you catch the real errors early without inventing new ones.
The rule, and where it sits
The principle is small and pays off constantly: the moment you load config is the moment to validate it. Resolve references, check presence and shape, convert types, and refuse to start on anything required-but-broken — so the failure lands at startup where it’s cheap and obvious, not mid-request where it’s expensive and cryptic.
This rhymes with a few things I’ve written. Inherited defaults are decisions nobody made — validating at load forces those defaults into the open. A config language is still code — so it deserves the same fail-fast rigor as code. And while pulling a secret from the environment beats hard-coding it, remember that environment variables are not a vault; the point here is purely when you resolve them, not that the environment is a safe place to keep them. If you’ve got config that detonates late and you want to move the failure earlier, I’m easy to reach.