Pakkit.net
← Back to blog

Engineering Practice

Build Automation That Doesn't Assume One Distro

Porting an automation suite from one Linux family to another is mostly mechanical until it isn't — the real work is the paths, package names, and service quirks you hardcoded without noticing, plus a few sharp edges that only appear on the new OS.

  • Engineering Practice
  • Ansible
  • Linux
  • Automation
Illustration of one automation playbook branching into three Linux-family lanes that highlight differing package names, service managers, file paths, and firewall tools.

Automation written against one Linux distribution quietly assumes a hundred things about that distribution. You don’t notice the assumptions while they’re all true. You notice them all at once the day you run the same automation against a different family and watch it faceplant. Porting it is less “translate the code” and more “find every assumption I baked in without realizing I was making it.”

The assumptions hide in paths, packages, and services

Three categories cover most of the damage:

  • Paths. The same software puts its config in different places across families. Hardcode one layout and the other family’s tasks write to the void.
  • Package names. The thing you install isn’t named the same on dnf and apt. A runtime, a headless JRE, a tools package — all drift by name.
  • Service management. How a unit is enabled, started, and restarted differs, and the assumptions you made about init behavior travel with it.

The fix in principle is simple: derive these from the OS family instead of writing one family’s answer directly into a task. In practice, finding all the places you didn’t is the work.

Branch on the family, share everything else

The pattern that kept this maintainable rather than forking into two parallel universes: detect the OS family once, load family-specific variables (paths, package names) and family-specific task files (install via one package manager or the other), and keep the genuinely shared logic — config templating, the actual orchestration, the part that encodes what you’re trying to accomplish — in exactly one place.

The goal is to isolate only what truly differs. If you find yourself duplicating the intent of a step per distro, you’ve drawn the seam in the wrong spot. Distro differences are mechanical; the intent should be shared.

The sharp edges only show up on the new OS

The mechanical stuff is the easy 80%. The interesting 20% is the failures that only exist on the OS you just added. Three that cost me real time, all transferable:

  • A newer base image removed something a vendored tool depended on. A newer OS shipped a newer language runtime that dropped a standard-library module, and a tool bundled with the database imported that now-missing module — so it crashed on import. (Concretely: a recent Python release removed asyncore, and a bundled client still reached for it.) The lesson: newer images carry removals, not just additions, and vendored tools rot against them.
  • Commenting out a line turned a script into a syntax error. I “just commented out” a single statement in a shell config — except it was the only statement in an if/elif branch, so the branch became empty and the shell choked with an unhelpful unexpected fi. The lesson: don’t comment the lone statement in a conditional; replace it with a : no-op so the branch still has a body.
  • A tool’s human-readable output changed shape across the OS. A cluster command printed addresses one way on one OS and a differently-formatted way on another, and a script that grepped that output quietly broke. The lesson: never parse human-formatted output for a machine decision — query a stable, intended interface instead.

Porting code so it parses on the new OS is not porting it. Porting is when it does the right thing on the new OS, and you only learn that by running it.

Test on the new OS, for real, early

It’s tempting to call a port “done” when the syntax checks pass and the playbook runs without erroring. That’s not done; that’s compiled. Every sharp edge above stayed perfectly hidden until something actually executed against the new family with real state. Stand the new OS up and run the path end to end before you trust it.

Build the seam before you need it

Here’s the argument for doing this proactively rather than under duress: the cost of cross-distro support is small if you build the seam early — family-derived variables and task files from the start — and brutal if you retrofit it after hardcoding one distro’s answers into fifty tasks. Adding the second family is where you discover how many assumptions the first one let you get away with.

You don’t need to support every distro on day one. You need to keep the intent of your automation separate from one distro’s implementation details, so that when the second family inevitably shows up, it’s a config problem and not a rewrite. The same discipline shows up in the other operational papercuts I’ve written about, like sysctl drop-in ordering. If you’re staring down a port of your own, my inbox is open.