Build Notes

Same Input, Different Result? Look Outside Your Code

A CI job that fails before it runs a line of your code, then passes on the very next run with nothing changed, is telling you the problem lives in the environment — and a retry is a mitigation, not a fix.

June 13, 2026

Build Notes
CI/CD
Operations
Debugging

A pipeline job started failing, and the first instinct — always — is to stare at the diff. What did I change? What did I break? In this case the answer was nothing, and the proof was sitting in the job history: the same commit failed, then passed, then failed again, with no edits in between. That pattern is one of the most useful signals in debugging, and it points in the opposite direction from your code. When the same input produces different results, stop reading your diff and start looking at the environment.

Read where in the run it died

The job was dying in about twelve seconds. That number matters, because twelve seconds isn’t enough time to reach my actual logic — it died during executor setup, while pulling the container image the job runs inside. The registry rejected the pull with an authentication error and the whole job fell over before a single line of my script executed.

The phase a failure happens in is half the diagnosis. A failure during setup, image pull, or dependency resolution is an environment failure. It has nothing to say about whether your code is correct, because your code never ran. Before you debug what a job does, confirm it got far enough to do anything at all.

Same input, different result, means look outside

The clincher was determinism, or the lack of it. I pulled the last several runs of the job and lined them up:

Run 1 — failed
Run 2 — failed
Run 3 — succeeded
Run 4 — failed

Same commit. Same configuration. Same image reference. A real bug in my code would fail every time on identical inputs — that’s what bugs do. A broken credential would fail every time too. Something that fails intermittently with everything held constant isn’t deterministic, and non-determinism almost always means the cause is outside the thing you control: a transient blip on a shared dependency, in this case the image registry having an auth/availability wobble under load.

Determinism is a clue. If the same input gives different answers, the variable you’re missing is in the environment, not the diff.

That table of recent runs took thirty seconds to assemble and reframed the entire investigation. It’s the cheapest diagnostic there is, and I now build it reflexively before touching anything.

Retry is a mitigation, not a root cause fix

Because the failure was transient, the durable repo-side move was to let the job heal itself: retry automatically on infrastructure-class failures, so a single blip during setup doesn’t kill the whole pipeline. That’s a real improvement — it stops a flaky dependency from being fatal.

But it is a mitigation, and it’s important to be honest about that. Retrying papers over a transient failure; it does not fix the thing that’s transiently failing. The actual cleanup — the registry credentials, the availability of the shared mirror — lives upstream, owned by whoever runs that infrastructure. Conflating “I made the symptom survivable” with “I fixed the cause” is how a known-flaky system stays flaky forever, because everyone’s retry logic keeps quietly absorbing it.

Fixing the surface can expose what was hidden behind it

There’s a twist worth bracing for. Once I got image pulls working reliably — by pointing at a mirror that didn’t gate the pull behind authentication — a chain of other failures surfaced that I’d never seen before. They’d been there all along; they just never got a chance to run, because the pull failed first and masked everything downstream.

That’s normal, not a sign you made things worse. When you remove the first failure in a sequence, the second one finally gets to fail in front of you. Each gate you fix reveals the next one that was hiding in its shadow. Expect it, and don’t let the sudden appearance of new errors trick you into reverting the fix that exposed them.

Prefer the dependency that can’t gate you

The deeper fix wasn’t retry — it was removing the failure mode. The images in question were public, and they were available from a mirror that served them anonymously, with no auth step to blip. Switching to it didn’t make the pull retry better; it made the pull stop having a thing that could reject it. Removing a failure mode beats getting good at recovering from it, the same way the most reliable automation has fewer single points of failure, not better handling of them.

The triage order I use now

This whole episode compressed into a checklist I run before I ever open my own code:

Where did it die? Setup, pull, and dependency phases are environment problems, not your logic.
Is it deterministic? Same input, different result, means look outside the diff. Build the last-N-runs table first.
If it’s flaky: mitigate so it’s survivable (retry on infra failures), then go remove or replace the flaky dependency — and say plainly which of those you’ve actually done.
If it’s deterministic: now it’s worth reading your diff.

Most of the time I lose to CI isn’t a bug in what I shipped — it’s mistaking an environment failure for a code failure and debugging the wrong layer. If you’ve got a flaky-pipeline war story or a triage habit that saves you here, trade notes with me.