Engineering Practice

Build a Local Test Harness First

When a system fails late and softly and the real platform is slow to spin up, the highest-leverage thing you can build isn't the feature — it's a local harness that boots the real engine, fires a real request, and reads the real result in seconds.

March 17, 2026

Engineering Practice
Testing
Docker
Developer Experience

Illustration of a local Docker harness sending a real request through a real engine and returning feedback in seconds.

Some systems punish you with a slow, brutal feedback loop. You make a change, then wait for a heavyweight platform to redeploy, then fire a request through it, then dig through logs to see what happened — minutes per iteration, on a good day. When that’s the loop, the most valuable thing you can build is not the next feature. It’s a local harness that collapses the loop to seconds: boot the real engine in a container, fire a real request, read the real result. The harness pays for itself almost immediately and then keeps paying.

A slow loop with soft failures is a trap

The motivating case was a system with two nasty properties at once. It failed late and softly — mistakes didn’t surface until a live request happened to exercise them — and spinning up the full platform to test took ages. Put those together and you get the worst possible development experience: you can’t tell if a change is right without a full deployment, and a full deployment is expensive, so you iterate slowly on something that gives you no early warning. Every small change becomes a gamble you can’t cheaply check.

That combination is the signal to stop and build a harness. Not because tooling is fun (it is), but because the loop itself is the bottleneck. When the cost of checking dominates the cost of changing, you optimize the check or you stay slow forever.

When verifying a change costs more than making it, the feedback loop is the thing to fix first. Everything else is downstream of how fast you can tell whether you were right.

Boot the real engine, not a mock

The design choice that made the harness trustworthy: it runs the actual engine, in a container, not a simulation of it. Mocks drift from reality and lull you into confidence that evaporates in production. A container that boots the genuine runtime, loads your change, accepts a real request, and emits its real execution trace and response tells you the truth, locally, in seconds.

The shape is general:

Containerize the real engine so a clean instance is one command away.
Fire a real request through it — an actual protocol message, not a stubbed function call — so you exercise the real dispatch path.
Read the real output — the engine’s own trace plus its response — so success and failure are judged by what the engine actually did, not by a test double’s approximation.

You’re not testing a model of the system. You’re testing the system, just small and fast and local.

Demand two agreeing signals for “it worked”

A subtle thing the harness got right: it didn’t trust a single indicator. Success required two signals that agreed — the response came back as expected and the engine’s trace showed the code path you intended actually executed. Either alone can fool you. A correct-looking response can come from the wrong path; the right path can run and still produce an unexpected reply.

Requiring both closes the gap where a test passes for the wrong reason — the most dangerous kind of green, because it’s confidence you didn’t earn. It’s the same instinct as it ran is not it worked: a process exiting zero isn’t proof of correctness. Make the harness prove the right thing happened for the right reason, and a passing run actually means something.

The fast loop is the whole point

The feature that changed how the work felt: a watch mode that boots the engine once and then, on every save, hot-reloads the change and re-fires — no reboot. That dropped the cycle from tens of seconds to a second or two. That sounds like a minor convenience and it is absolutely not. The difference between a 25-second loop and a 2-second loop is the difference between “test occasionally, when I’m fairly sure” and “test constantly, as I think.”

A loop tight enough to run on every keystroke changes your behavior: you experiment more, you catch mistakes the instant you make them, you stop batching up risky changes to amortize the slow test. Cheap iteration isn’t just faster — it makes you work in a fundamentally better way, because the cost of being wrong for a moment drops to near zero. (Closely related: paying an expensive setup cost once and reusing it.)

Then promote the harness to a CI gate

The harness had one more life. The same boot-and-fire check that powered local iteration became a gate in CI: a change that failed to compile, or whose request didn’t get the expected result, was blocked before it shipped. The earlier automated check only proved the thing started; the harness added “and it actually did the right thing when poked.”

That’s the elegant payoff. The investment you made to speed up your inner loop is the same investment that hardens your pipeline — local fast-feedback and CI enforcement are the same test, run in two places. Build the harness for your own sanity, and you get the quality gate for free.

So when you meet a system with a slow loop and soft failures, resist the urge to just power through the next change. Build the harness first: real engine, real request, real trace, two agreeing signals, a sub-second loop, then promoted to CI. It feels like a detour and it’s the opposite — it’s the thing that makes every change after it faster and safer. If you’ve built a fast-feedback harness for a stubborn system, I’d love to hear how you did it.