Infrastructure
A Golden Image Has to Forget Who It Was
Cloning a configured machine into a reusable template only works if you first wipe its identity — machine-id, SSH host keys, hostname, and NIC pinning — because anything you leave behind gets duplicated onto every clone.
- Infrastructure
- Linux
- Provisioning
- Homelab
The fastest way to stamp out identical machines is to configure one carefully, turn it into a template, and clone the template. The trap is that a machine accumulates identity while you configure it, and a naive clone copies that identity too. So you end up with ten “different” machines that all think they’re the same host. The fix is a step people skip until it bites them: before a machine becomes a template, it has to forget who it was.
I learned this building reusable Linux templates for my homelab, but the principle is older than any of the tooling — it’s just the cloud-era version of the old Windows sysprep ritual, and it applies to every hypervisor and image format.
Identity is the stuff that’s supposed to be unique
A freshly installed OS quietly generates a handful of values that are meant to be unique per machine. Clone without clearing them and you’ve manufactured collisions:
/etc/machine-id— the systemd identity a lot of other things key off (logging, DHCPDUID, sometimes the dbus identity). Duplicate it and two boxes can request the same DHCP lease and fight over an address.- SSH host keys — the keys that prove “this is the server you think it is.”
Clone them and every machine presents the same fingerprint, which is both a
man-in-the-middle warning waiting to happen and a guarantee of
known_hostschaos. - Hostname — ten machines named after the template is not a network you want to debug.
- NIC pinning — udev rules and NetworkManager keyfiles that bind a config to the template’s MAC address. The clone gets a new MAC, the binding doesn’t match, and networking comes up wrong or not at all.
None of these throw an error when duplicated. They just produce confusing behavior somewhere downstream, far from the cause. That’s what makes the wipe easy to forget and expensive to skip.
Split the slow part from the irreversible part
The mistake I made early was treating “prep a template” as one big script. It isn’t, because two very different kinds of work are hiding in there. One is slow, repeatable, and safe to re-run: patch the OS, prune old kernels, install the agent that reads cloud-init, set sensible defaults. The other is a one-shot, destructive wipe that ends with the machine powered off and its identity gone.
I keep them as two separate phases, run in order, with a verification step in between:
Phase one modernizes and is reversible. Phase two wipes identity and is not. Putting a checkpoint between them is the whole point.
That checkpoint matters because phase two is a one-way door. Once you’ve blanked the machine-id and deleted the host keys, you can’t easily inspect “did the prep actually take?” — the thing you’d inspect is gone. So I run a read-only verification pass after phase one (are the services I expect enabled, did the cloud-init datasource get configured, are the old kernels gone?) and only then pull the trigger on the wipe.
The wipe, concretely
The destructive phase is unglamorous but worth listing, because every line is “remove a thing that should be unique or stale”:
- Stop logging, then truncate the logs and login records so the template doesn’t ship someone’s old session history.
- Remove the SSH host keys so each clone regenerates its own on first boot.
- Blank
/etc/hostnameand/etc/machine-id(blank, not deleted — an empty machine-id is the documented signal for “regenerate me next boot”). - Run
cloud-init cleanso the next boot looks like a true first boot. - Strip the NIC pinning — NetworkManager keyfiles, legacy interface configs, and the persistent-net udev rules — so the clone’s new MAC comes up clean.
- Clear package caches and shell history, then power off.
There’s a distro wrinkle worth knowing: on some systems the SSH daemon regenerates host keys on startup automatically, so deleting them is enough; on others you have to install a small first-boot hook to do it. Same goal, different plumbing. If you template more than one distro, that difference is the bug you’ll hit, so handle it explicitly rather than assuming the behavior carries over. I wrote up the broader version of that habit in making automation work across distros.
Do not boot it again
The single most important operational rule comes after the wipe: do not power the template on again before converting it. Booting regenerates exactly the identity you just spent a script erasing — fresh machine-id, fresh host keys, a real hostname — and now your “clean” template is dirty again, silently. Convert it to a template straight from the powered-off state. If you need to change something, clone it, fix the clone, and re-template from there.
This is the part that feels paranoid until the one time you boot a template “just to check something” and spend an afternoon wondering why every new clone has the same SSH fingerprint.
Let the clone configure itself on first boot
A wiped template is a blank that needs filling in, and the clean way to fill it is at first boot rather than by hand. A cloud-init datasource (or your hypervisor’s customization step) hands the clone its real hostname, network config, and users the first time it starts. The template provides the software; the deploy provides the identity. That split is what makes the whole thing repeatable — and it’s the same boundary I leaned on when I had to deploy with only the access I had.
The takeaway is small and saves hours: a template is only reusable if it’s anonymous. Build it carefully, verify it, then make it forget its name before you clone it. If you run your own fleet and have a sysprep gotcha I didn’t hit yet, I’d like to hear it — and the homelab messy-parts notes collect more of these the-hard-way lessons.