Graphify: Turn a Codebase Into a Token-Cheap Knowledge Graph

Point a coding agent at a large repository and it does the obvious, expensive thing: it reads files. Lots of them, over and over, every session — and you pay for those tokens each time. The structure that would let it answer “what connects the auth layer to the response handler?” in one hop isn’t written down anywhere, so the model reconstructs it from raw text on every query.

Graphify (by Safi Shamsi, MIT-licensed) attacks that directly. It’s a Claude Code skill — you type /graphify and it reads your files, builds a knowledge graph, and hands the agent back structure it can query cheaply. The pitch is concrete: on a mixed corpus the project measures 71.5× fewer tokens per query versus reading the raw files. This post covers what it is, how it works, how the token savings actually arise (with the honest caveats), the benchmarks and user reports, and how it stacks up against vector RAG and the LLM Wiki pattern.

What it is

Graphify is a multi-modal knowledge-graph builder packaged as a skill for AI coding assistants (Claude Code, Codex, OpenCode). It turns an entire folder — source code, Markdown, PDFs, and images (screenshots, diagrams, whiteboard photos, even other languages via Claude vision) — into one interlinked graph that captures what the code does and the concepts it relates to. It runs entirely locally on NetworkX + Leiden clustering + tree-sitter, with no vector embeddings and no Neo4j/server (graphify.net).

The framing the author leans on is Andrej Karpathy’s habit of dumping papers, tweets, and screenshots into a /raw folder: Graphify is positioned as the answer to that pile — turn it into something queryable and persistent instead of re-reading it. That puts it in the same family as the LLM Wiki, but built as a concrete, automated tool rather than a prompt pattern.

Why it matters

Structure the agent can’t see otherwise. It surfaces “god nodes” (the highest-degree concepts everything connects through) and surprising cross-file or cross-domain edges — the architecture an agent would otherwise infer from scratch.
Persistent across sessions. The graph is saved to graph.json; you can query it weeks later without re-reading the source. A SHA256 cache means re-runs only process changed files.
Token-cheap queries. Because a query traverses a small subgraph instead of loading raw files, the per-query token cost drops sharply on large corpora (the headline 71.5×, with caveats below).
Honest about provenance. Every edge is tagged EXTRACTED, INFERRED, or AMBIGUOUS, so you know what was found versus guessed — a refreshing default for an LLM-built artifact.
Multi-modal and local. Code is parsed deterministically with tree-sitter (no LLM on source); only semantic descriptions of docs/images go to the model — never raw source.

How it works

Graphify is a staged pipeline; each stage is an isolated module. Code goes through tree-sitter (AST, call graph) locally; prose and images go through Claude for concept and vision extraction. Everything merges into one NetworkX graph, which Leiden clusters into communities, and an analysis pass finds the god nodes and surprises before exporting the outputs.

The pipeline: detect → extract (deterministic AST for code, Claude for prose/images) → build → cluster → analyze → export graph.json, graph.html, an Obsidian vault, and a wiki.

The outputs are worth knowing: an interactive graph.html, an Obsidian vault, a Wikipedia-style wiki/ an agent can crawl by reading files, a GRAPH_REPORT.md (god nodes, surprises, suggested questions), and the persistent graph.json. There’s also --watch (auto-rebuild as files change), a post-commit git hook, and an --mcp server.

Getting started

It’s a Python package plus a one-time skill install, then a slash command:

pip install graphifyy && graphify install   # package is "graphifyy"; command stays "graphify"

/graphify .                                  # build a graph for the current folder
/graphify query "what connects attention to the optimizer?"
/graphify path "DigestAuth" "Response"       # shortest path between two nodes
/graphify explain "SwinTransformer"          # what the graph knows about a node
/graphify add https://arxiv.org/abs/1706.03762   # fetch a paper and merge it in

Graphify doesn’t bundle a model — it uses the API key your assistant already has, and sends only semantic descriptions of docs/images upstream, never raw source.

In practice

Onboarding to a large or unfamiliar repo. Run /graphify ., read GRAPH_REPORT.md, and you have the god nodes and the surprising connections before reading a line — a map instead of a file tree.
Cross-domain understanding. Because it’s multi-modal, you can graph app code + a SQL schema + an architecture diagram + the design paper together, and ask questions that span all of them (“what connects this table to that endpoint?”).
Parallel-agent workflows. With --watch, code saves trigger an instant AST-only rebuild, so the shared graph stays current while several agents write code in parallel.
Research piles. Point it at a /raw folder of papers, tweets, and screenshots and get a navigable wiki — the Karpathy use case the project was built around.

The repo ships reproducible worked/ corpora so you can verify the numbers yourself: the Karpathy mixed corpus (3 GPT repos + 5 attention papers + 4 diagrams, ~52 files) produces 285 nodes / 340 edges / 53 communities, with example god nodes and a flagged surprise edge.

Performance and benchmarks

This is the headline, so it deserves care. Graphify prints a token benchmark after every run. On that Karpathy mixed corpus the project reports an average query cost of ~1.7k tokens versus ~123k naive — a 71.5× reduction; on a ~500k-word corpus it reports BFS subgraph queries staying around ~2k tokens versus ~670k naive (graphify.net).

Where the savings come from: instead of re-reading every raw file per query, the agent queries a small subgraph of a graph built once. The multiplier is real but corpus-dependent.

Two honest caveats, both of which the project itself states. First, the reduction scales with corpus size: on 6 files (which fit in a context window anyway) it’s roughly 1×, and the value is structural clarity, not compression — so don’t expect 71× on a tiny repo. Second, this is the project’s own benchmark, reproducible via the worked/ folders but measured on its chosen corpora, not an independent study.

On the outside view: a Medium write-up and a knightli.com review report the same ~71× framing, but they echo the project’s number closely and read more like amplification than independent measurement — treat them as enthusiasm, not verification. More usefully for balance, a GitHub issue titled “Graphify not improving token efficiency in Claude Code sessions” (#580) shows at least one user not seeing the savings in their own workflow — a reminder that the multiplier depends on corpus size, query patterns, and whether the agent actually routes through the graph. The project’s site lists 3.7k+ GitHub stars.

How it compares

Graphify, vector RAG, and the LLM Wiki all sit between you and a pile of raw files, but they build very different intermediate artifacts.

Dimension	Graphify	Vector RAG	LLM Wiki
Artifact	structural knowledge graph	chunks + embeddings	interlinked markdown pages
Retrieval	graph traversal (subgraph)	top-k vector similarity	read synthesized pages
Vector store	none (Leiden on topology)	required	none
Code awareness	AST + call graph (deterministic)	text-only	text-only
Multi-modal	yes (PDF, images, vision)	text unless extended	depends on ingest
Provenance	edges tagged extracted/inferred	none	lint pass
Best for	code + docs + papers, relationships	large, fuzzy, changing text	curated research synthesis

vs vector RAG. RAG retrieves text chunks by semantic similarity; Graphify encodes relationships and lets the agent walk them. For code, the graph’s deterministic call structure beats fuzzy chunk matching — and querying a subgraph is far cheaper than stuffing retrieved chunks into context each turn. The flip side: building the graph is upfront work, INFERRED edges can be wrong, and Graphify is code/structure-centric, whereas RAG is the better tool for huge, constantly-changing prose corpora where exact-chunk traceability matters. They’re not exclusive — Graphify’s own materials argue structural graphs beat vector RAG for code understanding specifically, not for everything.

vs the LLM Wiki. These are close cousins — both compile a /raw pile into a persistent, queryable artifact so the agent stops re-reading. The differences are in form and automation. Graphify is automated and deterministic for code (tree-sitter AST), produces a graph plus a benchmark, handles images/PDFs, and auto-syncs via watch/git-hook. The LLM Wiki produces human-readable prose synthesis with an evolving thesis, is agent-agnostic (a pattern, not a tool), and reads better as narrative — at the cost of more LLM-driven, manual-ish ingest and no AST. Tellingly, Graphify even exports an Obsidian vault and a wiki, so you can have the graph and the wiki-style reading experience. Pick Graphify when relationships and code structure are the point; reach for the LLM Wiki when the value is synthesized understanding of a curated reading list.

Tradeoffs

The big multiplier needs a big, mixed corpus. On small repos the token win mostly evaporates; you keep the structural map but not the 71×.
Inferred edges are guesses. The EXTRACTED/INFERRED/AMBIGUOUS tagging is honest, but you still have to read it — a confident-looking INFERRED edge can mislead.
Benchmarks are first-party. Reproducible, but independent verification is thin and the external write-ups largely echo the project’s own number.
It only helps if the agent uses it. As issue #580 suggests, savings depend on the agent actually routing queries through the graph rather than falling back to reading files.
Upfront extraction cost. The first build spends LLM tokens on docs/images (code is free via AST); the payoff is amortized over many later queries.

Takeaway

Graphify is a sharp answer to a real problem: coding agents waste tokens re-reading code whose structure was never written down. By compiling a folder into a local, multi-modal knowledge graph — with deterministic AST for code, honest provenance tags, and a persistent graph.json — it lets an agent answer from a small subgraph instead of the whole repo, with large per-query token savings on big mixed corpora. Treat the 71.5× as a real but corpus-dependent, first-party number, not a guarantee. If your problem is understanding relationships across code, docs, and papers, it’s more apt than vector RAG; if your problem is synthesizing a curated reading list into prose, the LLM Wiki is the better shape — and Graphify, which exports a wiki of its own, sits comfortably next to it.