AI Development

You Probably Don't Need a Vector Database

The reflex for "let an AI answer questions over my stuff" is to reach for embeddings, a vector database, and a RAG pipeline. For a lot of cases there's a simpler architecture — have the model compile your sources into a structured, interlinked set of notes once, and let it read those.

April 18, 2026

AI Development
Knowledge Management
RAG
Architecture

The default architecture for “let an AI answer questions over my documents” has calcified into a reflex: chunk everything, generate embeddings, stand up a vector database, and run retrieval-augmented generation on every query. It’s a lot of moving parts, and for many real use cases it’s more than you need. There’s a leaner pattern — sometimes called an “LLM wiki,” popularized by Andrej Karpathy — that skips the vector store entirely: have a capable model compile your raw sources, once, into a structured and interlinked set of plain notes, and then let queries read those already-distilled notes. Before you provision infrastructure, it’s worth asking whether disciplined files and a good agent get you there instead.

RAG re-derives understanding on every query

The thing that nags me about the standard RAG pipeline is that it does the comprehension work over and over. Every query embeds, retrieves the nearest raw chunks, and asks the model to make sense of them fresh — re-reading and re-interpreting the same source material every single time, with no memory that it ever figured this out before. It’s recompute-on-read as a way of life. For a knowledge base that’s relatively stable, that’s a strange amount of repeated work to rebuild the same understanding on every question.

RAG rediscovers your material on every query. Compiling it once means the understanding accumulates instead of evaporating.

Compile once, read many

The LLM-wiki pattern inverts that. Instead of retrieving raw chunks at query time, you spend the comprehension up front: the model reads your curated sources and compiles them into a structured wiki — summaries, concept pages, entity pages, all cross-linked. Later questions read those distilled, organized notes rather than the raw material. Knowledge compounds instead of being rediscovered. The expensive sense-making happens once and is captured; queries get cheaper and better because they’re reading something already digested and connected, not a pile of nearest- neighbor fragments. (It’s the same instinct as compiling your notes instead of re-reading them, applied to how an AI system retrieves.)

The whole stack is files and links

What makes this genuinely appealing is what it doesn’t require: no embedding model, no vector database, no retrieval pipeline to operate, tune, and pay for. The substrate is just plain markdown files with ordinary links between them, kept current by an agent. Navigation happens through the links and a few index pages, the way you’d move through a well-organized wiki — not through similarity search. That’s dramatically less infrastructure to stand up and maintain, and it has a side benefit vector stores lack: the compiled notes are human-readable. You can open them, audit them, edit them, and trust them, because they’re just text — not opaque vectors you have to reverse-engineer.

Why the simpler architecture is often enough

For a bounded, relatively stable body of knowledge — your own notes, a team’s docs, a product’s reference material — the compiled-wiki approach covers a surprising amount of ground, with less to break:

Traceability — each compiled page can cite its sources, so answers are auditable rather than emerging from a similarity-search black box.
Quality — a deliberately structured, interlinked set of notes is often a better context to answer from than the top-k chunks a vector search happens to surface.
Maintainability — updating knowledge is editing files and re-compiling, not operating a retrieval stack.
Portability — it’s plain text in version control; it goes anywhere and outlives any particular tool.

Where it doesn’t fit is the genuinely large-scale or fast-changing corpus — millions of documents, constant churn — where similarity search over embeddings earns its keep. The point isn’t “vector databases are bad.” It’s that they’re not the default, and reaching for one before you’ve tried the simpler thing is how you end up operating infrastructure your problem didn’t require.

Try the boring architecture first

So when the task is “answer questions over my stuff,” start by asking whether a compiled, interlinked set of notes plus a capable agent solves it — and only reach for embeddings and a vector store when you’ve actually outgrown that. The simplest architecture that works is the one you’ll still understand in six months, and for a lot of knowledge bases that’s plain files and good links, not a retrieval pipeline. It pairs with building a knowledge system that compounds and giving your agent durable context to work from. If you’ve replaced a RAG stack with a compiled-notes approach (or hit the wall where you genuinely needed the vector store), I’d like to hear where the line was.

Sources

The “LLM wiki” / compile-don’t-retrieve pattern was popularized by Andrej Karpathy. Content was rephrased here; the idea of compiling sources into an interlinked wiki rather than running RAG over raw text is his.