Don't make an LLM re-derive answers from raw documents on every query — that's RAG, and it never gets smarter. Instead, have it compile those documents once into a persistent, cross-linked wiki, then read from the compiled layer. The wiki compounds: every ingest and every good answer is filed back, so the knowledge base gets richer instead of resetting each session.
loading…
A source drops in. The LLM reads it, extracts the key facts, and integrates them into the existing wiki — touching a summary page, several entity and concept pages, the index.md catalog, and the log.md trail. One source can legitimately touch 10–15 pages.
A human wiki rots because nobody updates the index, nobody fixes the dangling link, nobody notices page A now contradicts page C. An LLM never gets bored. That swap — clerical tax for tireless bookkeeping — is the whole pitch.
Karpathy's framing: "the tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping." LLMs don't forget to update a cross-reference. So hand them the clerical work and keep the part only a human can do.
A durable, structured place an agent reads from and writes to across sessions, threads, and months — the unsolved problem of agentic AI made tractable with plain files. Your own MEMORY.md + topic files are a per-agent LLM-wiki.
It is not a product or a library — it is a discipline. The cleverness lives in how you set up the layers and the rules, not in any code you install.
Think of it as a cache hierarchy. The curated wiki is L1 — small, fast, always loaded into context. RAG over a giant corpus is L2 — large, slower, occasionally misses. Press a query and watch where the cost lands.
RAG re-pays the synthesis cost on every query and never gets smarter. The wiki pays synthesis once at ingest and amortises it across all future reads. The tradeoff is freshness — a compiled fact is stale until re-ingested.
Below a certain size you only need L1: direct file-reading is simpler, more reliable, and cheaper per query than any RAG pipeline. The smart move at scale is hybrid — curated wiki as L1, RAG as L2.
"Query Atlas before grepping" is literally "read the compiled wiki before re-deriving from raw." You are running this pattern every time you use the lab's knowledge base.
The whole thing is three folders on disk. Click each layer to see what it owns — and who's allowed to write to it.
Notice the asymmetry of who writes what. You own raw/; the LLM owns wiki/; you co-author the schema. Provenance is a first-class concern — at scale you can't tell human-curated truth from LLM guess by looking, so every file carries provenance tags in YAML frontmatter.
In our library every article carries sources:, authored_by, and date — the primary-vs-derived tagging the production write-ups had to retrofit, we had from day one.
Drop a source in and the LLM reads it, mints a summary, upserts every relevant entity and concept page, links them, and updates the catalog and trail. Press Ingest → and watch the document flow through the pipeline, live.
The LLM reads the source and discusses the takeaways with you. Human-in-the-loop is by design — that conversation is the write gate, not friction.
It writes a summary, then upserts each entity and concept page — merging into what exists and adding [[cross-links]]. One source, 10–15 pages.
A one-line entry goes in index.md; a timestamped record goes in log.md. The map and the audit trail stay current automatically.
Ingest fans a source into many pages. Query answers from the wiki — and files the answer back. Lint is the bookkeeping that keeps the store from compounding into contradiction. Run each verb and watch the trail.
The crucial twist: a good answer is filed back into the wiki. Your questions become knowledge. The query path is a write path — that is how exploration compounds.
Lint is not a nicety. It hunts contradictions, stale claims newer sources have superseded, orphan pages with no inbound links, and dangling cross-references. It is the operation Karpathy's own early vault was missing — and the thing that keeps a compounding store from compounding into error.
A real (tiny) semantic search, running in your browser — no fakes. Each note is a bag-of-words vector; your query is too; the cards sort by cosine similarity. Type below and watch the most relevant pages float to the top, glowing.
Every page becomes a sparse term-frequency vector. Your query becomes one too. Similarity is the cosine of the angle between them — shared words pull a page upward. This is the same idea behind production vector search, shrunk to fit on screen.
Past ~200 pages the single index.md stops scaling and you need real retrieval — BM25 + vector + graph traversal fused by reciprocal-rank-fusion. At that point you're running a hybrid system, and "no RAG needed" is no longer true.
Cross-links turn a pile of pages into a graph the LLM can traverse. Click any node to light its neighbours. Then press Lint — and watch the bookkeeping flag every orphan with no inbound link.
Newer sources can quietly contradict older pages. Lint surfaces the clash so a human resolves it by explicit supersession, not silent decay.
A page nothing points to is invisible to graph traversal — effectively lost. Orphan detection is exactly the lint pass Karpathy's first vault was missing.
A [[wikilink]] to a page that no longer exists is a dead end. Lint scans for them, plus stale claims and important concepts that lack their own page.
This pattern is the substrate under our whole knowledge base — and it has hard edges worth naming before you build one.
| The claim | The honest reality | The discipline |
|---|---|---|
| "Just write a CLAUDE.md" | The schema is most of the work; a bad one produces a confidently wrong wiki. The gist gives the shape, not tested prompts or a conflict algorithm. | Treat the schema as the product. Iterate it like code. |
| "It auto-decays old facts" | v2's confidence scores + forgetting curves are contested — numeric confidence is false precision, and decaying errors repeats old bugs. | Explicit supersession + git history beats automatic decay. |
| "Just drop files in" | Event-driven auto-ingest drifts — one production deploy needed 14 MCP servers + a post-compact hook to survive live operations. | Filter at ingest. Manual before automated. Git as audit trail. |
| "No RAG needed" | True only under ~200 pages. Above that, index.md stops scaling and you need real retrieval. | Go hybrid: wiki as L1, RAG as L2. No shortcut around the context limit. |
| "Compounding is all upside" | A wrong fact filed back during a query propagates. The mechanism that makes the wiki smarter makes a polluted one confidently wrong. | Lint frequency is a real operational parameter, not a footnote. |
Not a borrowed analogy. knowledge/research/ + ingested repos are our raw layer; the zoned markdown under knowledge/ is the wiki; ORG-CONVENTIONS.md + each zone's README.md + the auto-injected conventions are the schema.
The per-zone README.md files are Karpathy's index.md; the weekly audit-2026-Wxx.md files are his log.md. Cron re-ingests ~60 repos and emits weekly digests — log.md and a partial vault-wide lint, automated.
M12 — this educational library — and M13 — the deep-research loop that fans out reader models, synthesises, and writes a cross-linked article — are the ingest pipeline made concrete. The source article for this very lesson was produced by it.
We keep writes gated. Agents author pages, NAO curates — exactly the "manual before automated, git as audit trail" discipline the practitioners recommend. The clean three-layer diagram quietly grows a lot of plumbing in reality. Keep the human in the loop.
Compile, don't retrieve. Three layers — immutable raw, LLM-owned wiki, the schema that governs it all. Three verbs — ingest, query, lint. It only works small, so go hybrid; keep writes gated, provenance tagged, git as the audit trail. The knowledge base you just learned to build is the one this lesson lives in.
You've seen the whole system. Now re-read where it all begins — a neuron, a layer, a loss, backprop — with everything you now know stacked behind it.