Pakkit.net
← Back to blog

Security

Certificates Should Renew Themselves

A TLS certificate you renew by hand is an outage with a calendar invite — the durable fix is to treat certificates as a declarative, auto-renewing lifecycle with the issuing identity kept in a secret store, not a runbook.

  • Security
  • PKI
  • Automation
  • TLS

Almost every “the site is down” story I’ve heard that turned out to be an expired certificate has the same shape: someone requested a cert a year ago, did it correctly, wrote a runbook, and then the calendar reminder got lost in a job change or an inbox. The certificate did exactly what certificates do — it expired — and a human was the renewal mechanism, which means the renewal didn’t happen. The fix isn’t a better reminder. It’s removing the human from the renewal path entirely.

I think about certificates the way I think about backups: the manual version isn’t a process, it’s a future incident with a date attached. Here’s how I’ve moved cert work from “remember to do it” to “it happens on its own.”

Manual renewal is a process designed to be forgotten

A yearly task is the worst possible cadence for a human. It’s too rare to build a habit around and too consequential to drop, which is precisely the combination that gets dropped. And the failure is loud and public — TLS errors in every browser, clients refusing to connect — so the one time it slips, everyone sees it.

A renewal that depends on a person remembering once a year isn’t a control. It’s a countdown.

The manual runbook still matters as documentation of what correct looks like — which CA, which key parameters, which hostnames belong on the cert. But documentation of a process and execution of a process are different things, and you only get reliability when the execution is automated.

Treat the certificate as a lifecycle, not a file

The mental shift that makes everything else click: a certificate isn’t an artifact you fetch, it’s a lifecycle you declare. The states are request, issue, install, and — the one people skip — renew-before-expiry-and-rotate. Once you frame it that way, the automation writes itself, because every state has a clear owner that isn’t your memory.

The two pieces that make the lifecycle real:

  • Renew before expiry, automatically. The system should re-issue well ahead of the deadline (a renewBefore window) so there’s slack to catch a failure, not a cliff at midnight. Short-lived certs that rotate often are safer than long-lived ones precisely because rotation becomes routine instead of a once-a-year fire drill you’re rusty at.
  • Rotate the key, not just the dates. Reusing the same private key across renewals quietly defeats half the point. A real lifecycle issues a fresh key each time, so a key compromise has a bounded blast radius.

Two clean automation patterns

You don’t need anything exotic. Depending on where the workload lives, one of two patterns covers most of it:

  • For orchestrated/containerized workloads: an operator like cert-manager plus an issuer pointed at your CA. Each service declares a Certificate object with its hostnames and a renewBefore, and the operator handles issuance and renewal with no human in the loop. The cert lands in a secret the workload mounts. This is the strongest option when you have it available — the renewal is genuinely hands-off.
  • For plain VMs and bare services: an idempotent request-and-retrieve script — authenticate, check whether a valid cert already exists, request if not, retrieve, install, reload the service. Run it on a schedule that fires before renewBefore, and it converges to “everything has a current cert” every time it runs. Idempotent is the key word: running it again should be a no-op, not a second cert.

Both are the same idea — declare the desired state, let something converge toward it on a timer — just at different layers of the stack.

The automation’s identity is the thing to protect

Here’s where certificate automation meets the rest of security architecture. To issue certs unattended, your automation needs an identity (a service account) with rights at your CA. That credential is now the keys to your TLS kingdom, and where it lives matters more than any other decision here:

  • It goes in a secret store, pulled at run time — never committed to git, never baked into an image, never sitting in a config file in the repo. This is the same line I hold in keeping secrets out of git: the convenient place to put a credential is almost always the wrong one.
  • It gets the narrowest scope that works — issue and renew within one policy zone, not blanket admin over the whole CA. Least privilege applies to robots most of all, because a robot with broad access runs at machine speed when it’s wrong.

If you automate issuance but leave the issuing credential lying around, you haven’t reduced risk — you’ve concentrated it into one stealable thing.

Don’t let automation erase ownership

A subtle failure of cert automation is that it can decouple a certificate from who’s responsible for it. When a human requested certs, there was at least a name attached. Full automation can happily mint certs that nobody can later map back to an owner or an application — which is its own kind of mess at audit time.

So I keep an ownership tag flowing through the automation — an application ID or owner reference carried as a field on the request, all the way into the CA’s record. It costs nothing and it means “what is this cert, and who owns it?” still has an answer when the cert was issued by a script at 3am. Automating a task shouldn’t mean automating away its accountability.

A couple of gotchas worth pre-empting

Two things that bite when you build this against a real enterprise CA platform:

  • Platforms accrete API generations. A long-lived product often has an older API and a newer documented one living side by side, with different auth schemes. Build on the current, documented surface and don’t mix the two — straddling API generations is how you get auth that works in one call and fails in the next.
  • Don’t disable certificate verification to “make it work.” It’s tempting, when the automation that manages trust hits a TLS error, to turn off verification and move on. Resist it. Supply the proper CA bundle instead. An automation that issues certs while ignoring cert validity is writing a very ironic incident report.

The whole point is to make the boring, correct thing happen without anyone thinking about it. Certificates are a solved problem mechanically; they only become an incident when a human is load-bearing in the loop. Take the human out, protect the identity that replaces them, and keep the ownership trail intact. It’s the same security-as-design instinct I wrote about in security is architecture, not decoration. If you’re automating your own PKI and want to compare approaches, my inbox is open.