Automation
"It Ran" Is Not "It Worked"
Some runtimes have no compiler to catch your mistakes, so you have to define what "it worked" means yourself — and a single signal like "it didn't crash" is the easiest one to fool yourself with.
- Engineering Practice
- Testing
- Automation
- Reliability
I spent a while working in an interpreted policy language — the kind a runtime reads and executes directly, with no compile step in front of it. There’s no build that turns red when you typo a branch name. The engine just loads your code and runs it, and a mistake doesn’t announce itself; it quietly does nothing, or does the wrong thing, while everything around it looks fine. Working in that environment forced me to get precise about a distinction that’s easy to gloss over everywhere else: “it ran” and “it worked” are not the same claim, and only one of them is worth anything.
Some languages fail late and softly
With a compiler, a whole class of mistakes can’t survive to runtime — the wrong name, the wrong shape, the unreachable branch get caught before anything executes. Take the compiler away and those mistakes ride straight through to execution. A mistyped condition doesn’t error; it just evaluates false forever and the branch you cared about never fires. The program “succeeds” by doing less than you think.
That’s the trap of late, soft failures: nothing crashes, so nothing warns you. The absence of an error is not evidence of success — it’s just the absence of an error.
A single signal is easy to fool
The tempting way to check “did it work” is to grab the one signal that’s easiest to read:
- The process exited zero.
- No exception showed up in the log.
- I got a response back.
Every one of those can be true while the thing you actually wanted to happen didn’t. The exit code reports that the program finished, not that it did its job. The response came back, but maybe from a default path that skipped your logic entirely. A single convenient signal is exactly the one most likely to give you a confident false positive.
Require two signals that come at it from different angles
What I landed on — and now reach for everywhere — is to never accept a single signal as a pass. The check has to satisfy two independent signals that both agree, and they have to come at the answer from genuinely different directions:
- The outcome: the system returned the result I expected.
- The evidence of execution: the system’s own trace shows the specific code path I intended actually ran.
Either alone lies. The right outcome can happen for the wrong reason — a default, a fallthrough, a cached answer. The right code path can run and still produce a result you didn’t want. Only when both agree do I call it a pass.
The right answer for the wrong reason is a bug that’s waiting for the inputs to change. Make the test prove why it passed, not just that it did.
A nice side effect: this makes a legitimate “no” legible too. When the expected outcome is a rejection and the trace shows the rejecting path ran, that’s a confirmed execution — a real pass of a negative test — not a failure of the check. You can only tell those apart if you’re reading both signals.
Build the fast loop so checking costs almost nothing
Two-signal verification only happens if it’s cheap. If confirming a change means a full restart and a manual poke every time, you’ll stop doing it under pressure — which is exactly when you most need it.
So the second half of this is a tight feedback loop: load the runtime once, then on every save hot-reload and re-fire the check, so a cycle is a second or two instead of half a minute. When the cost of verifying drops near zero, verifying stops being a chore you skip and becomes the natural rhythm of the work. That same instinct — make the safe path the easy path — is why I think automation needs a dry-run mode you build before the automation itself.
Watch for the environment faking a pass
The most unsettling false green I hit had nothing to do with my code. The packaging
step ran on macOS, whose tar quietly injects ._-prefixed AppleDouble sidecar
files into archives. The runtime dutifully tried to parse those junk files as input
and failed — but in a way that, at first glance, looked like a normal run. The tool
and the platform conspired to produce output that looked like a result.
That’s the deepest version of the lesson: it’s not just your logic that can fake a pass, it’s your tools and your environment. A check is only trustworthy if it would actually go red when the thing it’s guarding is broken — so every so often, break it on purpose and confirm it notices.
When there’s no compiler, your check is the spec
Strip away the compiler and “what does correct mean here” stops being something the toolchain answers for you. You have to answer it — explicitly, in the form of a check that demands two agreeing signals and is cheap enough to run constantly. Write that definition down and the runtime’s silence stops being scary, because you’re no longer trusting “it ran” to mean “it worked.”
It matters even more when the thing acting isn’t a script but an agent, which can be confidently wrong in ways you didn’t anticipate — the same proposal-then-verify discipline is the backbone of how I run the AI automation lab. If you’ve built a verification loop that survived contact with a runtime that fails softly, compare notes.