Pakkit.net
← Back to blog

AI Development

Small Slices Beat Big Bang AI

AI agents earn their keep when work is cut into reviewable, testable slices. Big-bang prompts make chaos; small slices make momentum.

  • AI
  • Workflow
  • Pull Requests
  • Validation
  • Process
Illustration of a large chaotic AI changeset being cut into small reviewable pull-request slices, each with a diff and a passing check, moving through a review gate to deploy.

There are two ways to put an AI agent to work. You can hand it the whole feature and hope, or you can hand it one small, well-defined slice and check the result. The first feels faster for about ten minutes. The second is the one that actually ships. This post is less about why small scope helps an agent reason — I’ve written about keeping the architecture intact elsewhere — and more about the boring mechanics: how the work is cut, how it gets reviewed, and how a pull request stays something a human can actually trust.

Why the big-bang prompt fails

A giant prompt feels productive because a wall of code falls out of it. But that wall arrives all at once, undifferentiated, and now it’s your problem. You didn’t save the work — you moved it from “writing” to “verifying,” and verification is the expensive half.

Big-bang generation fails in specific, repeatable ways:

  • The diff is unreviewable. A 900-line change touching nine files isn’t a review, it’s a leap of faith with a green checkmark on it.
  • The blast radius is everything. When one decision in the middle is wrong, it’s tangled into forty others, and you can’t cleanly back any of them out.
  • The context doesn’t fit. Ask for too much and the agent quietly drops constraints, because it’s juggling more than the window can hold — same as a human would.

A huge generated change isn’t progress you’ve banked. It’s a debt you haven’t read yet.

Why small files matter

Slice size and file size are the same conversation from two directions. If the work is small but it lands in a 600-line module, the change is still buried. So the unit of work and the unit of code both stay small on purpose.

Small files give you leverage that has nothing to do with AI and everything to do with being able to see:

  • A bad change is obvious when the file does one thing, because the wrong line has nowhere to hide.
  • The agent gets a tighter target — “this component, these props” beats “the frontend, somewhere.”
  • The next slice starts from a clean boundary instead of from a knot you have to re-understand first.

This is just good architecture, and the agent benefits from it for the same reason the human does: less surface area, fewer places to be wrong.

Every slice needs acceptance criteria

The difference between a slice and a vague request is that a slice tells you when it’s done. Before any code gets written — by me or by an agent — the slice gets a short, honest definition of done. Not a wish. A spec.

A usable acceptance criterion is something you could hand to a stranger and they’d know whether the work passed: this route renders, this type-checks, this content shows up in the RSS feed, this edge case returns the right error. If “done” is a feeling, the slice isn’t ready to start. Writing the criteria first also catches the slices that were secretly two slices wearing a trench coat — if the definition of done has an “and” in it that smells like a second feature, split it.

Validate before you polish

The most expensive mistake in agent-assisted work is polishing something that was never correct. Generated code is plausible by default — it compiles in your head, it reads fine, and it’s wrong in a way only running it reveals. So the order of operations matters: prove it works, then make it nice.

The loop I actually run, in this order:

  1. Type-check and build. Cheapest signal, run it first, every time.
  2. Run the thing and look at the real output — not the agent’s description of the output.
  3. Check it against the acceptance criteria, point by point.
  4. Only then worry about naming, structure, and the small flourishes.

Polishing first inverts the cost: you sink effort into prose and tidy structure around a core that’s about to be rewritten. Validate first and the polish lands on something that’s going to survive.

Slices keep humans in control

The quiet risk in agent workflows isn’t a dramatic failure — it’s the slow swap where the agent starts making the calls and you start rubber-stamping them. Slice-based work is the structural defense against that, because every slice is a checkpoint where a human has to actually decide.

It also maps cleanly onto how software is already reviewed. One slice is one pull request: small enough to hold in your head, scoped to a single intent, with a description that says what it proves and a diff a reviewer can read in a sitting. That’s not AI hygiene — it’s just PR hygiene, and it happens to be exactly what keeps an agent honest. Squash the impulse to batch five slices into one “while I’m in here” mega-PR; the batching is where reviewability goes to die.

Keep the pull requests small and the humans stay in charge by default, not by heroics.

How this shows up on pakkit.net

This site is built this way, not described this way. Each change is a numbered slice: one scoped goal, validated in an isolated container with a type-check and a build, verified in the rendered output, then committed and opened as a small PR that does exactly one thing. Adding the three posts you’re reading was three slices’ worth of thinking and one tight diff — content only, schema unchanged, RSS still valid — precisely so the review could be quick and the blast radius could be nothing.

That’s the whole trick, and it isn’t glamorous: the agent supplies the speed, the slices supply the control, and the pull request is where the two meet. If you want the architecture-level version of this argument, it’s in AI-Assisted Development Without Losing the Architecture; if you’re trying to make agents move fast without your codebase turning into a haunted house, I’m happy to compare notes.