LLM Wiki: Compile Your Knowledge Instead of Retrieving It
In April 2026, Andrej Karpathy published a short markdown file — a GitHub gist called llm-wiki.md — describing a way of using LLMs to manage knowledge that quietly reframes the whole problem. It isn’t a product or a library. It’s a pattern: an “idea file” you paste into a coding agent, which then builds a version for your needs.
The pattern is called the LLM Wiki, and the one-line version is this: instead of having the model re-read your raw documents every time you ask a question, you have it build and maintain a persistent, structured wiki once — and keep it current forever. Some commentary has framed it as “the end of RAG as we knew it”. That’s too strong, as we’ll see. But the idea is genuinely useful, and worth understanding precisely. This post covers what it is, how it works, how it compares to RAG, and what the token economics actually look like.
What it is
An LLM Wiki is a directory of interlinked markdown pages that an LLM agent writes and maintains on your behalf, sitting between you and a collection of raw source documents. You curate the sources and ask the questions; the agent does all the summarizing, cross-referencing, filing, and bookkeeping. You read the wiki; the LLM writes it.
The mental model Karpathy gives is a software one: “Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.” You rarely touch the wiki directly. You drop in sources and ask questions, and the agent edits dozens of pages to keep the whole thing consistent. It’s a knowledge base that maintains itself.
Why it matters
The problem it solves is not retrieval — it’s accumulation. Most LLM-plus-documents setups (RAG, file uploads, NotebookLM) are stateless: every question starts from scratch, the model finds some relevant fragments, answers, and forgets. Nothing is built up between queries.
- Knowledge compounds. Each source the agent ingests doesn’t just get indexed — it updates existing pages, strengthens cross-links, and flags contradictions. The wiki gets denser, not just bigger.
- The bookkeeping problem disappears. People abandon personal wikis because the maintenance — updating cross-references, keeping summaries current, reconciling new facts with old — grows faster than the value. An LLM doesn’t get bored and can touch 15 files in one pass. This is the actual unlock.
- Synthesis becomes cheap. Because connections are made at ingest time, a question that needs five sources stitched together is answered by reading a few already-connected pages, not by re-discovering the links every time.
- Your explorations persist. A good answer can be filed back into the wiki as a new page, so the things you figure out in conversation become permanent knowledge instead of vanishing into chat history.
How it works
The architecture has three layers, and the LLM only owns the middle one.
Raw sources are your curated documents — immutable; the LLM reads them but never edits them, so you can always recompile the wiki from scratch. The wiki is the LLM-owned layer of markdown pages: summaries, entity pages, concept pages, an index, a log. The schema is a CLAUDE.md (or AGENTS.md) file that tells the agent how pages are structured and how to ingest, answer, and maintain — the config that makes it disciplined rather than improvising.
Two housekeeping files keep it navigable at scale: an index.md (a catalog of every page with one-line summaries, read first to find relevant pages) and an append-only log.md (a timeline of ingests, queries, and audits). Karpathy notes this index-first approach works well at ~100 sources and a few hundred pages — without needing embedding-based RAG infrastructure at all.
Getting started
You need three things: an agent (Claude Code, Codex, or similar), a folder, and optionally Obsidian to browse the result. The pattern is bootstrapped by pasting the gist into your agent and letting it interview you:
mkdir my-wiki && cd my-wiki
mkdir raw # immutable sources go here
# open your agent in this folder, paste karpathy/llm-wiki.md, and say:
# "Read this idea file and set up an LLM Wiki here. Ask me what it's
# about and what sources I'll feed it, then write me a CLAUDE.md schema."
The agent responds with clarifying questions (topic, kinds of sources, page types), and from your answers it writes the CLAUDE.md schema and initializes index.md and log.md. You’ve built the whole structure without writing code — the pattern working as intended. For a full worked walkthrough (ingesting two essays and watching the graph densify), see Urvil Joshi’s step-by-step write-up.
In practice
Day to day, the wiki runs on three operations.
Ingest. You drop a source into raw/ and tell the agent to process it. It reads the document, writes a summary page, updates the index and log, and revises related entity and concept pages — a single source can touch 10–15 pages. Crucially, on the second related source it doesn’t start fresh: it reads the existing wiki, detects the conceptual overlap, and adds a cross-link no human authored. The graph gets denser.
Query. You ask a question and the agent reads the synthesized pages (not the raw PDFs), following the links between them, and answers with citations. The answer can be a page, a comparison table, even a slide deck — and good answers get filed back into the wiki, so explorations compound just like sources do.
Lint. Periodically you ask the agent to audit the wiki: find contradictions, stale claims, orphan pages with no inbound links, concepts mentioned but missing a page. This is the maintenance no human wants to do, and it’s why the wiki stays healthy as it grows. As the gist puts it, the idea is a working version of Vannevar Bush’s 1945 Memex — a personal, curated knowledge store with associative trails — whose one unsolved problem was who does the upkeep. The LLM does.
How it compares
This is the comparison everyone reaches for, and the honest answer is that RAG and the LLM Wiki solve different problems — one retrieves, the other accumulates.
RAG shines when you have a large, constantly-changing corpus and need precise traceability to an exact chunk — customer support, legal search, enterprise fact lookup over millions of documents. The LLM Wiki shines on a bounded, curated corpus (roughly a hundred to a few hundred sources) where synthesis matters more than lookup: a research project, a book you’re studying, your own journal. The table makes the trade-offs concrete.
| Dimension | RAG | LLM Wiki |
|---|---|---|
| Source documents | stay raw, retrieved per query | compiled into structured pages |
| State | stateless — each query from scratch | stateful — knowledge compounds |
| Cross-time synthesis | weak (assembles fragments) | strong (links are pre-built) |
| Traceability | exact, to the chunk | 1–2 steps removed from the source |
| Freshness | always re-reads latest | updates require re-ingest |
| Error blast radius | a wrong answer is local | a wrong summary can bake in across pages |
| Best for | large, changing corpora; fact lookup | bounded research corpora; deep study |
So “the end of RAG” overstates it. For millions of changing documents with audit requirements, RAG is still the right tool. The Wiki is a better fit when the corpus is small enough to compile and the value is in the connections.
Performance and benchmarks
Here’s the honest caveat up front: neither Karpathy’s gist nor the secondary write-ups publish hard token counts, so there is no measured “X tokens before, Y after” to quote. What we can describe is the cost shape, and it’s the most important practical point. The motivation Karpathy gave in his launch tweet was about where his tokens go — a large fraction of his throughput shifting from manipulating code to manipulating knowledge — not an efficiency benchmark.
The token cost doesn’t disappear; it moves from query time to ingest time. RAG pays a recurring cost: every query re-retrieves and re-reads chunks, so cumulative tokens scale with the number of questions you ask. The Wiki pays a large one-time cost per source (read it, rewrite 10–15 pages), then makes each query cheap (read a few synthesized pages). The two cost curves cross.
The takeaway from the shape: the Wiki wins on tokens when a bounded corpus is queried many times (deep, repeated study), and loses when the corpus is huge or churns constantly (you’d pay the ingest cost over and over). Treat the chart as a model of the economics, not a benchmark.
Tradeoffs
- Hallucinations can bake in. This is the real risk. Because the LLM compresses sources into pages and links them, a single misunderstanding can propagate across the wiki and start to look like a fact. With RAG a wrong answer is one wrong answer; here it can become load-bearing. The
lintstep and spot-checking pages against raw sources are not optional. - Answers are a step removed from the source. You’re trusting the synthesized page, not the original chunk. For work that needs exact, defensible citations, that indirection is a downside.
- Ingest is expensive and updating means re-ingesting. Front-loading the work is the whole point, but it makes the pattern a poor fit for fast-changing data.
- It only fits a bounded corpus. This is a personal/team knowledge tool at the scale of hundreds of sources — not an enterprise search system over millions of documents.
- You’re still the curator. The LLM removes the bookkeeping, not the judgment. Garbage sources in, confident garbage wiki out.
Takeaway
The LLM Wiki is a genuinely good idea with a narrow, honest sweet spot: a curated corpus you’re going deep on, where the valuable answers come from connecting sources rather than looking one up, and where you’ll query it enough times to amortize the ingest cost. Its real contribution is solving the bookkeeping that has always killed personal wikis — the LLM does the maintenance no human sustains. Don’t read it as the death of RAG; read it as a different tool for a different job. Reach for RAG when you need fresh, traceable lookup over a large or changing corpus, and for the LLM Wiki when you want knowledge that compounds.