Pakkit.net
← Back to blog

Engineering Practice

Your Hot Path Is Rebuilding the Same Thing Every Call

A decryption function on a database read path was secretly rebuilding all its lookup tables from scratch on every call — hoisting that one-time setup out of the hot path made it roughly fifteen times faster without touching the algorithm.

  • Engineering Practice
  • Performance
  • Optimization
  • Cryptography
Performance diagram contrasting a hot path that repeats expensive setup on every call with a version that initializes once and then runs only fast work.

The most satisfying optimizations don’t change what the code does. They stop it from doing the same expensive setup over and over. I had a function sitting on a database read path that was running about seventeen times slower than a raw read, and the fix wasn’t a cleverer algorithm or a rewrite in a faster language. It was noticing that the function rebuilt all of its static scaffolding, from cold, on every single call.

The shape of the waste

The function decrypted a value on its way out of the database. Before it could do any actual decryption, it had to construct a pile of constant lookup tables and a derived key schedule — the kind of setup that’s identical no matter what input you feed it. And because the function couldn’t hold any state between invocations, it rebuilt that entire scaffolding every time it ran. Call it a million times and you build the same tables a million times.

When I profiled it, the “work” — the actual decryption — was cheap. The setup it insisted on redoing dominated the runtime. The function wasn’t slow because decryption is slow. It was slow because it kept re-deriving constants.

The fix is hoisting, not genius

Precompute the constant tables once, bake them in as static data, and the per-call setup cost drops to zero. Same algorithm, same bytes out, roughly fifteen times faster on a single thread. No new dependency, no exotic trick — just stop recomputing the part that never changes.

The one bit of care worth copying: I generated the precomputed constants by running the original’s own setup code and capturing what it produced, rather than re-deriving them by hand. That guaranteed the fast path’s tables were byte-for-byte what the slow path would have built, which matters enormously for the next section.

Why this kind of waste hides

Per-call setup cost is nearly invisible where most people look. A unit test calls the function once — fast. A code review reads it doing reasonable-looking work — fine. It only bares its teeth on the hot path, under load, at scale, where “fast enough once” becomes “the bottleneck a million times over.” A profiler will point you at the hot function, but the instinct that actually cracks it is a question: what in here is the same every time, and why am I computing it again?

The step people skip: prove it’s identical

A performance rewrite of correctness-critical code — crypto, parsers, anything touching money — is only safe if you can prove the fast version behaves exactly like the slow one. So before this went anywhere near real data, I ran it against a few hundred equivalence and fuzz cases: length sweeps across the awkward boundaries, production-shaped inputs, and a batch of seeded random garbage. Zero mismatches. As a bonus, the rewrite even fixed a latent crash the original hit on certain short inputs.

A faster function that’s subtly wrong is worse than a slow one — the slow one at least gave you correct answers.

Skipping the equivalence check is how a tidy 15× speedup becomes a corruption bug you find in production three weeks later.

The pattern is everywhere

Once you’ve got the question in your pocket, you see this everywhere a hot function does “setup, then work” without keeping state between calls:

  • compiling a regular expression on every call instead of once;
  • re-parsing the same config on every request;
  • opening a fresh connection per query instead of reusing a pooled one;
  • reallocating the same buffers inside a tight loop.

All the same shape: constant work, needlessly repeated.

Know when you’ve got the 90%

One honest footnote. After hoisting the tables, I tried a dozen further micro-optimizations — and they all landed within measurement noise of each other. The precompute was the entire win; the rest was rounding. It’s easy to keep polishing a hot path long after you’ve captured the real gain, so it’s worth measuring deliberately, the way I argue for treating a benchmark rig as part of the experiment, and stopping when the curve flattens.

The fastest code is the code that doesn’t run twice. Find the constant work on your hot path and do it once. If you’ve got a function that’s mysteriously hot and you suspect it’s re-deriving something it shouldn’t, I’d love to hear about it.