Pakkit.net
← Back to blog

Engineering Practice

Tag Your Telemetry Where It's Born

Once log events from different environments pool together downstream, you can't reliably tell them apart again — so stamp the origin onto each event at the source, where the fact is still known for certain.

  • Engineering Practice
  • Observability
  • Logging
  • Operations

Here’s a failure mode that shows up the moment more than one environment ships logs to the same place: lab and non-production events land in the same pipeline as production, trip production alerts, and start paging people about problems that aren’t real. The obvious fix — “just filter out the lab stuff downstream” — runs into a wall almost immediately, because by the time the events have pooled together, there’s no reliable field on them that says where they came from. The real fix is upstream: stamp the origin onto every event at the source, while the source still knows the answer for certain.

This is one of those lessons that sounds trivial written down and costs a real afternoon when you learn it live.

You can’t re-derive origin after the fact

The reason downstream filtering is so tempting and so brittle: people reach for the hostname. “Lab hosts are named lab-something, so exclude those.” It works until it doesn’t — a host gets renamed, a new box doesn’t match the pattern, the naming convention was never as consistent as you believed — and now production alerts are either leaking lab noise or, worse, silently dropping real events that happened to match your exclusion.

Origin is a fact the source knows for free and a guess everywhere downstream. Capture it where it’s free.

The clean version is to attach an explicit attribute — env=lab, env=prod, whatever your taxonomy is — to each event at the point it’s collected, so the fact travels with the data instead of being reconstructed from circumstantial evidence later. Downstream then filters on a field that’s actually true, not on a proxy for it.

Know which kind of “tag” you’re using

Every logging stack I’ve touched overloads the word “tag,” and the distinction is the whole game:

  • A search-time label is applied when someone runs a query. It’s convenient, but it does not travel with the data and isn’t written onto the event. It’s evaluated against fields that already exist — so it inherits all the brittleness of the hostname approach, just hidden behind a nicer name.
  • An index-time field is written onto the event when it’s collected and parsed. It rides along in the forwarded stream, persists on disk, and is reliable to filter on forever after.

For “where did this come from,” you want the second kind. Picking the search-time version because it’s easier to set up is how teams end up with a label that looks like a solution and behaves like the original problem. Spend the small up-front cost to put a real, index-time field on the event.

Even better: give each environment its own stream

A field is good. A separate destination is cleaner still. If you can route each environment into its own index/stream/bucket on the downstream system, then separation stops being “remember to add env!=lab to every alert” and becomes “production alerts only ever look at production streams.” The filtering is structural — you can’t forget it, because the lab data simply isn’t in the place production queries look.

The belt-and-suspenders answer is to do both: stamp the env field and route to a per-environment stream. The field gives you flexible cross-environment queries when you want them; the separate stream gives you safe-by-default isolation when you don’t think about it. They’re cheap together and they cover each other’s gaps.

One honest caveat: tagging only affects new data. There’s no retroactive stamp on events already sitting downstream untagged. So plan the cutover — your alert logic has to tolerate a window where old events lack the field — rather than assuming the fix is instant.

The part that’s really about working with other teams

The most useful insight from this whole exercise wasn’t technical. The pipeline often crosses an ownership boundary: you own the source, another team owns the destination and the alerts firing on it. The wrong move is to dump the problem on them — “your alerts are noisy, go figure out which of our hosts are lab.” They can’t, reliably, for all the reasons above.

The right move splits the work along the ownership line:

  • You make your data separable — stamp the env field at your source, because that’s the part only you can do correctly.
  • They apply one well-defined filter — “exclude env=lab” — which is a single, durable query edit, not an ongoing investigation into your naming conventions.

That reframing turned a vague cross-team complaint into two small, clearly-owned tasks. The general principle: when a pipeline crosses a boundary, do the part that’s cheap on your side because you have the context, and hand the other side the smallest, most precise thing you can. It’s the same respect-the-boundary thinking that makes any distributed system tractable.

Watch out for config that manages itself

A trap that cost me real time: on a lot of fleets, the agent/forwarder config is centrally managed — pushed from a config-management system. If you SSH in and edit the config on the host directly, your change works exactly until the next sync, when the central copy overwrites it and your carefully-added field silently vanishes. The durable change has to be made at the source of truth the fleet pulls from, not on the box.

This rhymes with a broader gotcha I’ve written about in the last config drop-in wins: layered and managed configuration resolves conflicts in ways that quietly discard your edit if you don’t know the rules. Before you change a config, find out who actually owns the final say — the live system, or something upstream that will reassert itself.

None of this is glamorous, and that’s the tell that it’s real operations work. Telemetry is only useful if you can trust what it says and slice it by where it came from, and both of those are decided at the source, not rescued downstream. Stamp the origin early, separate the streams if you can, and split the cross-team work along the lines of who knows what. If you’ve untangled your own lab-noise-in-prod-alerts mess, I’d like to hear how.