inwo inwo.
← all posts

andrej-karpathy-skills: A 70-Line CLAUDE.md That Stops Agents Overengineering

June 7, 2026 · Shingo Nakamura · AI

Coding agents have a recognizable failure mode: you ask for a small change and you get a 200-line “framework,” a refactor of code you never mentioned, and a confident answer to a question you never asked. In January 2026, Andrej Karpathy described these exact pitfalls in a widely shared X post — models that make wrong assumptions and run with them, that overcomplicate, and that touch code they don’t understand.

andrej-karpathy-skills is one developer’s answer to that post: a single CLAUDE.md file that turns Karpathy’s complaints into four operating principles for Claude Code. It was written by Forrest Chang (Jiayuan Zhang) and is mirrored under the multica-ai organization; the repository has accumulated on the order of 175k stars on the mirror, making it one of the more visible takes on agent discipline (GitHub).

This post covers what the file actually is, how its four principles map to the problems Karpathy named, how to install it, and — importantly — an honest read on the viral “accuracy” numbers floating around it, most of which the repository itself never claims.

What it is

andrej-karpathy-skills is a configuration file, not a tool. The core deliverable is a short CLAUDE.md (roughly 70 lines of plain markdown) that you drop into a project so Claude Code reads it as standing instructions before it writes code. The repo also ships the same guidelines packaged as a Claude Code plugin and as a committed Cursor project rule (.cursor/rules/karpathy-guidelines.mdc), so the behavior carries across both tools (README).

One honesty note up front: the project is derived from Karpathy’s public observations, but Karpathy did not author or endorse it. His name is on it because the principles are distilled from his post, not because he shipped it — a distinction third-party coverage has made explicit (TechTimes).

Why it matters

The appeal is that it is small, readable, and aimed squarely at the agent behaviors that waste the most time. The README frames it as four principles mapped to specific failures:

  • Think before coding addresses wrong assumptions, hidden confusion, and missing tradeoffs.
  • Simplicity first addresses overcomplication and bloated abstractions.
  • Surgical changes addresses orthogonal edits and touching code you shouldn’t.
  • Goal-driven execution turns vague imperatives into verifiable, test-first goals.

Because it is a single markdown file you can read end to end in under a minute, the cost of trying it is close to zero, and there is nothing opaque to audit. That low-friction, high-legibility quality is a large part of why it spread.

How it works

There is no runtime and no magic. CLAUDE.md is a file Claude Code automatically loads as context for a project; this repo simply fills it with four principles plus the concrete tests that make each one enforceable. The diagram below maps Karpathy’s stated pitfalls to the principle that targets each one.

Karpathy's pitfalls mapped to the four principles Three observed LLM pitfalls on the left — wrong assumptions, overcomplication, and touching code it shouldn't — each connect to one of four principles on the right: think before coding, simplicity first, surgical changes, and goal-driven execution. Karpathy's pitfalls Principles Wrong assumptions, no clarifying questions Overcomplicates, bloats abstractions Edits code it doesn't understand Think before coding Simplicity first Surgical changes Goal-driven execution
How the four principles map to the LLM pitfalls Karpathy described. "Goal-driven execution" backstops "think before coding" by replacing guesses with verifiable success criteria.

What makes the principles more than slogans is that each comes with a blunt test the agent can apply to its own output. The README states them directly:

Simplicity First — "Would a senior engineer say this is overcomplicated?
                    If yes, simplify." If 200 lines could be 50, rewrite it.

Surgical Changes — "Every changed line should trace directly to the
                    user's request." Don't improve adjacent code; don't
                    delete pre-existing dead code unless asked.

The fourth principle, goal-driven execution, is the one that leans hardest on a Karpathy quote: “Don’t tell it what to do, give it success criteria and watch it go.” In practice that means rewriting “add validation” as “write tests for invalid inputs, then make them pass” — turning an open-ended instruction into a loop the agent can verify itself against.

Getting started

There are two supported paths. The recommended one is the Claude Code plugin, which makes the guidelines available across every project rather than one repo. From inside Claude Code you add the marketplace and install:

/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills

If you’d rather scope it to a single project, the file-based option is a one-liner. For a new project you fetch the file directly; for an existing one you append it to your current CLAUDE.md:

# new project
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

# existing project (append)
echo "" >> CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

Cursor users get the same behavior via the committed .cursor/rules/karpathy-guidelines.mdc rule; the repo’s CURSOR.md explains how to reuse it in other projects.

In practice

The clearest way to see what changes is the “add validation” example the README and third-party write-ups both lean on. Without guidance, an agent asked to “add validation” may silently choose strict RFC 5322 email checking with DNS lookups when you wanted a basic non-empty check. Under think before coding, the agent is instructed to surface the ambiguity instead — “Option A is a simple regex, Option B is full RFC compliance; which do you want?” — and under goal-driven execution, the task becomes “write tests for the invalid inputs you care about, then make them pass.”

The second recurring scenario is the drive-by refactor. You ask for a one-line bug fix and the diff also reorganizes imports, renames a variable two functions away, and rewrites a comment. Surgical changes is the rule meant to stop this: the agent should clean up only the orphans its own change created, and otherwise leave working code — and pre-existing dead code — untouched unless you ask. The README’s stated success signal is exactly this: “fewer unnecessary changes in diffs” and “clean, minimal PRs.”

How it compares

The honest framing is that this is a curated default, not a methodology. It competes less with other tools than with the blank CLAUDE.md most people start from.

ApproachWhat you getEffort
Empty / hand-rolled CLAUDE.mdYour own conventions, maintained by youongoing
andrej-karpathy-skillsFour ready-made discipline principles, ~70 linesone install
A full methodology plugin (e.g. Superpowers)Brainstorm-plan-build workflow with TDD and reviewheavier, more opinionated

Against a heavier plugin like Superpowers, andrej-karpathy-skills does far less — it adds no workflow, no subagents, no planning pipeline. That’s a strength if you want a minimal nudge and a weakness if you want enforced process. The two are not mutually exclusive: this file is small enough to merge into a larger setup, which the README explicitly encourages.

Performance

This is where care matters. The repository itself makes only qualitative claims about results: you should see fewer unnecessary diffs, fewer rewrites caused by overcomplication, clarifying questions arriving before implementation rather than after mistakes, and cleaner PRs. There are no benchmarks in the README.

Some third-party articles attach precise figures to it — an accuracy jump “from 65-70% to 91-94%,” a “53% faster rendering” anecdote, an “11% efficiency gain.” Those numbers do not appear in the repository and trace to a single AI-authored blog post that doesn’t substantiate them (byteiota). Treat them as marketing, not measurement. The defensible statement is the qualitative one: a short, well-aimed set of constraints tends to reduce the agent behaviors that generate noisy diffs — which is plausible and matches what the README promises, but isn’t a benchmarked result.

Tradeoffs

The principles bias toward caution over speed, and the README says so. That is the right trade on non-trivial work and the wrong one on a typo fix — applying the full rigor to obvious one-liners just slows you down, so some judgment is still required. The guidelines are also only as good as the agent’s adherence; a CLAUDE.md is guidance, not a hard constraint, and a model can still ignore it. And because the principles are deliberately general, they don’t replace project-specific rules — they’re meant to be merged with your own conventions, not to stand alone. Finally, the Karpathy association is a double-edged marketing asset: it drives attention, but the name on the repo can imply an endorsement that doesn’t exist.

Takeaway

andrej-karpathy-skills is a small, legible bet: that most of an agent’s worst habits can be curbed by four explicit constraints with self-applied tests. Reach for it when you want a sane default for Claude Code or Cursor without committing to a heavyweight workflow, and merge it with your project rules rather than treating it as complete. Skip the hype around the precise accuracy numbers — they aren’t the repo’s claims. The thing worth remembering is the one principle that does the most work: stop telling the agent what to do and start giving it success criteria it can verify.