Pakkit.net
← Back to blog

Systems Thinking

The Schema Is the Real Documentation

When you inherit or integrate with an unfamiliar system, the data model tells you the truth the wiki won't — what it actually stores, what's really sensitive, and where it's secretly mid-migration.

  • Systems Thinking
  • Databases
  • Architecture
  • Engineering Practice

When I have to understand a system I didn’t build — to integrate with it, debug it, or take it over — I’ve learned not to start with the documentation. I start with the data model. The schema, the entities, the namespaces, what’s a primary key, what’s indexed, what’s encrypted. Docs describe what a system was supposed to be on the day someone wrote them. The data model describes what it actually is right now, because it’s the thing the running code can’t lie about.

Code can drift from docs; it can’t drift from its schema

Documentation rots. It’s written once, at one moment, by someone with one mental model, and then the system moves on without it. Comments go stale, wikis describe a refactor that got reverted, the architecture diagram shows a service that was deleted last quarter.

The data model can’t afford that luxury. Every read and write in production goes through it. If a table exists and the code queries it, that table is real and load-bearing, no matter what the docs say. If a column is a partition key, that choice shapes every access pattern whether or not anyone documented why. The schema is the one description of the system that’s continuously validated by the system actually running. That makes it the most honest source you have.

Docs tell you the intent. The schema tells you the behavior. When they disagree, the schema is winning.

Reading the model surfaces the things nobody tells you

The first time I really leaned on this, I was figuring out how a platform I had to integrate with persisted its data. Reading its connection setup, its entities, and how it carved up its namespaces in an afternoon taught me more than any handoff doc, and most of it was stuff nobody would have thought to write down:

  • What it really stores. The entity list — access points, sites, users, sessions, lifecycle records — was the domain model. You could read the business off the table names better than off the marketing.
  • What’s actually sensitive. The login path ran personal fields through an encrypt step before writing them, so those columns held ciphertext, not plaintext. That one detail told me the real threat model and where the decryption key was a load-bearing secret — something no overview page mentioned.
  • What’s secretly mid-migration. Two generations of persistence coexisted: an older store still in the live path for one component, a newer relational one for the rest. The schema made the half-finished migration obvious in a way the “current architecture” doc absolutely did not.

None of that came from documentation. It came from reading what the system stores and how.

The same name can hide two completely different things

The sharpest lesson from that dig was a trap I now look for everywhere. The same import name — a shared package the code pulled in to talk to “the database” — resolved to two entirely different implementations depending on which component used it. In one, it meant one database technology; in another, the same name meant a totally different database. Identical import, opposite realities.

If I’d trusted the name, I’d have built a completely wrong picture of how data flowed. Only following it down to what it actually connected to revealed the split. It’s the data-layer cousin of a shared name not being a shared file: a label is a hope, the implementation is the truth, and the only way to know which you’ve got is to follow the reference to its real target.

A migration that keeps the interface and swaps the guts

The most elegant thing I found was also the easiest to miss: the team had changed the implementation of the data layer while keeping the name of the interface constant. Consuming code kept importing the same thing; underneath, the storage engine had been swapped. The callers didn’t have to change, which is exactly the property you want in a migration — but it also meant you could not tell, from the calling code alone, what was actually backing it.

That’s a genuinely good design move — migrating without touching the callers is how you change a system without a flag day. But it has a cost for whoever comes next: the seam is invisible from the outside. The only way to see it is to read the data layer and ask what the stable name resolves to today. Stable interfaces are great for callers and quietly hostile to archaeologists.

How I read a system’s data model now

When I sit down with an unfamiliar system, the order of operations is roughly:

  • Find the connection config first. What does it actually connect to, and how many distinct stores are there? (Often more than the docs admit.)
  • Read the entities/tables as the domain model. The names and relationships are the business logic, compressed.
  • Note the keys and indexes. Partition and primary keys dictate the access patterns — they tell you how the system is meant to be queried, and where it will hurt if you query it wrong.
  • Look for what’s encrypted or transformed on write. That’s where the real sensitivity and the real secrets live, and it’s almost never in the overview.
  • Chase shared names to their real targets. Don’t assume one name means one thing across a codebase.

Then, and only then, I go read the documentation — as a set of claims to verify against the model, not as the source of truth. Half the time the doc is a decent map; the other half it’s a museum piece, and the schema is the only thing telling me where the system actually is. If you’ve inherited a system and learned its real shape by reading its tables, I’d love to hear what it taught you.