Pakkit.net
← Back to blog

Build Notes

Keeping Plaintext Out of Git History With Clean/Smudge Filters

Git can transparently encrypt files on commit and decrypt them on checkout using clean/smudge filters — a genuinely useful trick for shared lab secrets, with one security trade-off you have to understand before you trust it.

  • Build Notes
  • Git
  • Security
  • Tooling

Sometimes you have a file that needs to live in a repo — a lab config, a test fixture, a throwaway cert — and it carries a credential you’d rather not commit in the clear. The right default is “don’t put secrets in git at all.” But for the narrow case of low-value, shared lab secrets a team has to version together, git has a genuinely clever mechanism: clean/smudge filters that keep plaintext in your working tree and ciphertext in history, transparently. It’s worth understanding both because it’s useful and because its sharp edge teaches you something about crypto trade-offs.

Lead with the caveat, because it’s the whole point: this is for shared lab secrets, not real ones. For anything that matters, use a secrets manager. What follows is a tool with a deliberately bounded blast radius.

How clean/smudge filters work

Git lets you register a filter that rewrites a file’s content as it moves in and out of the object store:

  • clean runs when you stage a file: it reads the working-tree content on stdin and writes what-gets-committed on stdout. Point it at an encrypt step and the blob in git is ciphertext.
  • smudge runs on checkout: it reads the committed blob and writes the working-tree content. Point it at a decrypt step and your working copy is plaintext.

Three pieces wire it together: a .gitattributes entry assigning the filter to specific paths (secrets/*.conf filter=secret-crypt), a bit of local git config mapping the filter name to the clean/smudge commands, and the script that does the crypto. Day to day you forget it exists — git add encrypts, git checkout decrypts, and the plaintext never enters history.

The trick that makes it usable: a deterministic salt

The naive version breaks immediately, and why it breaks is the interesting part. A normal encryption call uses a random salt, so the same plaintext encrypts to different ciphertext every time. Wire that into a clean filter and git sees a “change” on every single stage even when the file content is identical — your history fills with diffs that aren’t real diffs. Unusable.

The fix is to derive the salt deterministically from the content itself (an HMAC of the plaintext under the key, truncated). Same plaintext plus same key produces the same ciphertext, so git sees no spurious change. That one move is what turns the idea from “technically works” into “actually livable.”

And it is also the security trade-off, which you must say out loud:

A deterministic salt means an attacker who can guess a candidate plaintext can confirm it by re-encrypting and comparing to your committed ciphertext. That’s acceptable for shared lab credentials. It is not acceptable for anything you actually need to keep secret.

This is the honest core of the whole technique: you bought clean diffs with a real bit of confidentiality. Knowing exactly what you traded is the difference between using a tool and misusing it.

The fail-safes that keep it from corrupting your tree

A transparent filter is scary precisely because it’s invisible, so the design has to be defensive:

  • Both directions detect already-processed input. Hand the clean filter something already encrypted and it passes through unchanged; hand the smudge filter plaintext (no encryption header) and it passes through unchanged. Idempotence keeps a stray double-run from mangling files.
  • A fresh clone degrades safely. Before someone sets up the key, checkout just leaves ciphertext in place with a hint, rather than failing or emitting garbage.
  • Readable diffs without decrypting history. Git’s textconv hook lets git diff show the decrypted text even though the stored blob is ciphertext, so review still works.

The one config flag that prevents disaster

Here’s the setting that matters most, and the reason to be careful: by default, if a clone doesn’t have the filter installed, git will happily commit the file in plaintext — silently defeating the entire scheme. Setting the filter to required flips that into a loud failure instead. With it on, a missing filter aborts the operation; without it, the protection quietly evaporates on any machine that wasn’t set up. If you take one thing from this: mark the filter required, or you’ve built a security control with an off-by-default failure mode.

When to reach for this, and when not to

Use it when: a team needs to version sensitive-ish files together, the secrets are low-value and rotatable (lab creds, test fixtures), and the alternative is people emailing configs around or — worse — committing them raw. It’s strictly better than plaintext-in-history for that case, and the transparency means people actually use it instead of working around it.

Don’t use it when the secret is real. Production credentials, signing keys, customer data — those belong in a proper secrets manager with access control, rotation, and an audit trail, none of which a git filter gives you. And even with the filter, rotating the passphrase doesn’t rewrite old history, so a leaked key means rotating the underlying credentials too. This is the mechanism-level companion to my blunter rule in keeping secrets out of git: the best secret in a repo is no secret, and this trick is for the cases where “no secret” genuinely isn’t an option.

It’s a sharp little tool. Sharp tools are great when you know which end is which — and dangerous when you reach for them out of convenience for a job they’re wrong for. If you’ve wired up transparent encryption in a repo and have a war story, I’d enjoy hearing it.