Engineering Practice
Artifacts Should Know Where They Came From
A build artifact with no provenance is a liability waiting to happen — stamp every published package with the commit, branch, build time, and pipeline that produced it, so any artifact can be traced back to the exact source that made it.
- Engineering Practice
- CI/CD
- Build Notes
- Supply Chain
Here’s a question that’s uncomfortable in a lot of shops: pick any published artifact — a package in your registry, a built image, a release tarball — and ask “exactly which commit produced this, on what, when?” If the answer is “uh, let me guess from the version number and the timestamp,” you have a traceability gap. An artifact that can’t point back to the source that made it is a liability the moment anything goes wrong with it. The fix is cheap and worth making a habit: stamp every artifact with its provenance at publish time.
A version number is not provenance
The instinct is to wave this away — “the version tells us.” It doesn’t, not really. A version number tells you the intended release. It doesn’t tell you which exact commit was checked out, whether the working tree was clean, which pipeline ran, which machine built it, or with what tool versions. Two artifacts can carry the same version and have been built from different commits, or by different people, or with a dirty tree. The version is a label someone chose; provenance is the facts about what actually happened.
That gap stops being academic the instant something is wrong with a shipped artifact. “This build is misbehaving — what’s in it?” should be answerable in seconds, by reading the artifact’s own metadata, not reconstructed from circumstantial evidence and memory. When you can’t trace an artifact to its source, every incident involving it starts with an archaeology dig.
A version number says what you meant to ship. Provenance says what you actually shipped. Only one of those helps at 2am.
Stamp it at publish, automatically
The mechanism is simple: a step that runs right after a successful publish and attaches metadata to the artifact in the registry. The point is to capture what was true at build time, while you still have it, and bind it to the thing you produced. Roughly three families of fact are worth stamping:
- Source provenance. The commit hash (the headline — full and short), the branch or tag, the commit timestamp and author, the remote it came from. This is the thread back to the exact source.
- Build provenance. When it was built, on what host, by which user, with what tool and language versions. This is what lets you reproduce or rule out environment-specific weirdness.
- Pipeline provenance. The CI pipeline and job that produced it, with links. This connects the artifact to the automated run and its logs.
Add integrity checksums alongside and you can also prove the artifact wasn’t altered after publish. The CI system usually hands you most of these values for free as predefined variables — you’re mostly just collecting what’s already sitting there and pinning it to the output.
Make it best-effort, not a gatekeeper
A design decision that matters more than it sounds: the provenance step should be best-effort by default. If stamping the metadata fails — a registry hiccup, a permissions blip — it should log loudly but not fail an otherwise-successful publish. The build succeeded; the artifact is good; don’t throw it away because the sticky-note step stumbled.
This is a small instance of a bigger principle: a secondary concern shouldn’t be able to break a primary one. Provenance is enormously valuable and it’s supporting metadata, not the deliverable. Wiring it so a metadata failure blocks releases would mean a cosmetic problem could halt shipping, which is the tail wagging the dog. Make it strict only where you’ve decided traceability is a hard compliance requirement — and make that an explicit, opt-in choice, not the accidental default.
Provenance pays off in the boring, important moments
Why bother, when most artifacts are fine? Because the value shows up exactly when things are not fine, and you can’t add it retroactively:
- Incident response. “Which commit introduced this?” answered instantly from the artifact, instead of bisecting from memory.
- Supply-chain trust. Being able to prove an artifact came from a specific commit and pipeline, unaltered, is increasingly table stakes for anyone consuming what you ship.
- Debugging the weird ones. “It works from this build but not that one” gets a lot shorter when each build carries its full birth certificate.
- Auditing. “Show me where this came from” stops being a research project.
The cost is a small automated step you write once and reuse across every publish path. The payoff is that every artifact you produce can answer the most basic and most important question about itself — where did you come from? — without anyone having to guess. It’s the same impulse as decision records capturing the why: write down the context at the moment you have it, because you won’t be able to reconstruct it later. If you stamp provenance into your own pipelines and have opinions on what’s worth capturing, I’d love to hear them.