Engineering Practice
Monorepo or Polyrepo? Pick Both
The monorepo-versus-polyrepo debate is a false binary — a thin git superproject that pins a set of independent repos as submodules gives you independent lifecycles and one coherent checkout at the same time.
- Engineering Practice
- Git
- Architecture
- Source Control
When a team consolidates a pile of scattered repos into something coherent, the debate is always framed as a binary: monorepo or polyrepo, pick a side. I went through this deciding how to lay out a family of related components, dutifully built the pros-and-cons table — and then realized the table itself was lying to me. The best columns from each side aren’t mutually exclusive. A thin git superproject that pins each component as a submodule lets you take the wins from both, and the dichotomy mostly dissolves.
The table that frames it as a choice
The standard comparison looks clean and conflicted. Monorepo: one place to clone, one shared pipeline, trivial to develop and test components together — but awkward versioning, a giant shared history, and a temptation toward forced-coordinated releases. Polyrepo: independent versioning, independent release cadence, clean per-repo ownership and permissions — but duplicated CI, scattered discovery, and inter-component dependencies resolved only through published versions.
Read it as “which set of tradeoffs do you hate least” and you’ll pick a side and inherit its whole downside column. But look at which tradeoffs actually conflict and which just happen to be listed opposite each other, and a third option appears. The pain points aren’t symmetric the way the table implies.
“Monorepo or polyrepo” assumes the wins are mutually exclusive. The most useful question is which wins you can keep at the same time.
The hybrid: a superproject of submodules
The structure that took the best of both: a thin top-level repo — a superproject — that aggregates each component as a git submodule. Every component stays its own standalone repo with its own history, merge requests, permissions, pipeline, and version. The superproject doesn’t contain their code; it pins a known-good commit of each and provides one coordinated place to clone, build, and ship the whole family.
That single move grabs a column from each side:
- The polyrepo win — independent lifecycles. Releasing one component never forces a release of the others. Each owns its own version, CI, ownership rules, and cadence. The repos stay small and focused.
- The monorepo win — one source of truth. One recursive clone gets the whole family at a consistent, known-good set of versions. One orchestrator can build or publish them all. Discovery and onboarding stay simple: there’s a front door.
- It dodges the monorepo tax. No awkward whole-tree tagging, no giant monolithic history, no pressure to release everything together. The big, test-heavy components keep their existing repos and pipelines untouched.
You don’t pick a side. You pick a topology where the sides cooperate.
The superproject pins; it doesn’t merge
The thing that makes this work — and the thing you have to internalize — is that the superproject’s job is to pin, not to contain. Its entire diff, most of the time, is “component X moved from commit A to commit B.” It records the coherent “these exact versions go together” snapshot, and that snapshot is reproducible: a recursive clone of the superproject at a given commit gets you precisely that set.
That’s a genuinely different mental model from a monorepo, where everything moves together in one history. Here the components move independently and the superproject periodically captures a consistent cross-section of them. The reproducible snapshot is the deliverable. (It does demand the submodule discipline — pin deliberately, bump pointers in their own commits — but that discipline is cheap once the model clicks.)
Consumers never see the seams
The detail that sold me: the people who use the components are unaffected by any of this. They install each component individually from wherever artifacts are published; they never need the superproject at all. The superproject is an internal organizing and release convenience for the people who build the family, not a thing consumers depend on.
That’s an important property. The repository topology is a build-and-release concern, and it shouldn’t leak into how the world consumes your output. If choosing “monorepo” or “polyrepo” changes what your consumers have to do, you’ve let an internal decision become an external contract. The hybrid keeps the seam where it belongs — inside — and presents the same clean published artifacts either way.
Question the binary, then define the pattern
The meta-lesson outlasts git. When you’re handed a forced choice between two named options, it’s worth asking what each option is actually good at and whether you can compose those goods rather than choosing between them. “Monorepo vs polyrepo” is a particularly clean example, but the move — refuse the binary, find the structure that captures the wins from both — applies all over architecture.
What makes the hybrid real rather than a clever idea is writing down the pattern so it’s repeatable: how a new component gets created and added, the naming and layout conventions, how versioning and publishing work, where CI lives. A “best of both” structure that only one person understands is just a liability with good PR. Document the scaffolding, and the third option becomes a paved road instead of a one-off. If you’ve escaped the monorepo-versus-polyrepo argument with a structure of your own, I’d love to compare notes.