Systems Thinking
Trust the System of Record, Not the Ticket
A ticket can be confidently, specifically wrong — so before you act on the IPs, hostnames, or targets it hands you, verify them against the systems that actually know the truth, and triangulate across more than one.
- Systems Thinking
- Operations
- Debugging
- Infrastructure
I picked up a task that came with a list of specific targets — exact addresses for the machines I was supposed to act on. The list was precise, official, and wrong. Not vaguely wrong — confidently wrong, pointing at a completely different set of machines than the ones the task was actually about. If I’d trusted it and run the procedure, I’d have changed the wrong systems. The save was a habit worth making explicit: before acting on what a ticket asserts about infrastructure, verify it against the systems that actually know, and don’t trust a single source — triangulate.
A ticket is a claim, not a fact
It’s easy to treat a ticket as authoritative because it’s formal — it has a number, a reporter, fields, a process behind it. But a ticket is just someone’s claim about the world at the moment they wrote it, and claims about infrastructure go stale and start out wrong all the time. The specific addresses on a ticket are frequently a best guess that hardened into apparent fact simply by being typed into an official-looking box.
The precision is what makes it dangerous. A vague ticket (“fix the auth database”) keeps you appropriately cautious. A precise one (“apply this to these six exact addresses”) invites you to skip verification, because it looks like someone already did the work. Specificity reads as authority, and that’s exactly the trap. The more exact the wrong answer, the more confidently you’ll act on it.
A ticket tells you what someone believed. The infrastructure tells you what’s true. When they disagree, the infrastructure wins — every time.
Ask the systems that actually know
The fix is to verify the ticket’s claims against systems that can’t be out of date the way a ticket can, because they reflect current reality rather than past belief. For “which machines are these, really,” that meant going to the sources of truth directly:
- The live systems themselves. Connecting to a node and asking it what cluster it’s in and who its peers are gives you ground truth — the machine can’t be wrong about its own identity the way a ticket can.
- The infrastructure inventory. The virtualization or cloud layer knows what actually exists, what it’s named, and what address it really has — independent of what any document claims.
- The canonical configuration in source control. The configs that actually define and deploy the systems are a third, independent witness to what’s supposed to be where.
Each of those is closer to reality than a hand-entered ticket field, because each reflects the system as it is, not as someone remembered it.
Triangulate; don’t trust a single witness
The part that turns “check it” into something rigorous: I didn’t rely on one source to overturn the ticket. I checked three independent ones and confirmed they agreed. The live systems said one thing, the inventory agreed, and the canonical configs agreed — and that convergence is what made me confident the ticket was wrong, not just my hunch against its claim.
Triangulation matters because any single source can also mislead. A live system might be mislabeled; an inventory entry might be stale; a config might describe intent that was never applied. But when three independent sources that derive their information differently all tell the same story, that story is almost certainly true. One contradicting source is a question. Three agreeing sources is an answer. The confidence comes from the agreement, not from any one witness.
Distinguish “the data is wrong” from “the ask is ambiguous”
A subtlety the verification surfaced: sometimes the ticket isn’t wrong so much as ambiguous, and the addresses are wrong because the underlying request was under-specified. In my case, the targets named one tier while the task plausibly meant another — and the right move wasn’t to silently substitute the targets I thought were correct. It was to surface the discrepancy, lay out what each interpretation implied, and get the intent clarified before touching anything.
That distinction matters because “the IPs are wrong, here are the right ones” and “the IPs are wrong because we haven’t actually agreed what this task is targeting” need different responses. Verifying against reality doesn’t just catch stale data; it exposes when the request itself is unsettled. Acting on a confidently-wrong ticket is one failure mode; confidently “correcting” it based on your own assumption is another. When reality contradicts the ask, the move is to escalate the ambiguity, not to quietly pick a side.
The cost asymmetry makes this a no-brainer
Why make a habit of this when most tickets are fine? Because the costs are wildly asymmetric. Verifying targets against the systems of record costs minutes. Acting on a wrong target — changing the wrong machines, running a destructive step against the wrong cluster — can cost an incident, or worse, and it’s the kind of mistake that’s brutal to walk back. A few minutes of “let me confirm these are really the right boxes” is the cheapest insurance in operations.
So before any procedure that names specific targets, I now check those targets against reality first — and against more than one source. It’s the operational twin of verifying a live value instead of trusting the file you wrote: don’t trust the assertion, trust the system’s own report of itself, and trust it most when independent sources agree. A ticket is a fine place to start. It’s a dangerous place to stop. If you’ve been saved (or burned) by checking the targets before acting, I’d like to hear the story.