Matt Pocock's Skills for Real Engineers: Small, Composable Agent Skills
Most agent-skill collections want to own your process. Frameworks like GSD, BMAD, and Spec-Kit hand you a whole methodology — and in doing so they take away your control, and make bugs in the process itself hard to find and fix. mattpocock/skills takes the opposite bet.
It’s Matt Pocock’s personal .claude skills folder, published. Pocock — the TypeScript educator behind Total TypeScript — describes them as the skills he uses every day “to do real engineering, not vibe coding,” based on “decades of engineering experience.” The pitch is deliberately humble: small, easy to adapt, composable, model-agnostic. Hack around with them. Make them your own.
This post covers what the collection actually is, the specific failure modes it targets, how the skills work and hand off to each other, the two that stand out (the grilling sessions and the TDD loop), and the honest tradeoffs of a “primitives, not a framework” approach. Where I’m relaying an opinion rather than something in the README, I say so — and there is a caveat about third-party coverage worth flagging up front (see How it compares).
What it is
mattpocock/skills is a curated set of agent skills — single SKILL.md files (markdown with a little YAML frontmatter, plus referenced scripts where needed) that a coding agent reads and follows when it’s relevant. They’re grouped into three buckets in the repo: engineering skills used daily for code work (tdd, diagnose, grill-with-docs, to-prd, to-issues, triage, improve-codebase-architecture, zoom-out, prototype, setup-matt-pocock-skills), productivity skills that aren’t code-specific (grill-me, handoff, caveman, write-a-skill), and misc tools Pocock keeps around but uses rarely (git-guardrails-claude-code, setup-pre-commit, and a couple of TypeScript-flavoured migration helpers).
The unifying idea is restraint. These aren’t a turnkey workflow that drives the agent from idea to merge. They’re sharp, single-purpose primitives you reach for at the moment you need them, and they work with any model and across coding agents — the SKILL.md format is the same one Claude Code, Codex, Cursor, and others consume.
Why it matters
Pocock frames the whole collection around four failure modes he kept hitting with coding agents, each paired with a fix and a quote from a software-engineering classic. The structure is the argument: these are old engineering fundamentals, re-applied to agents.
- The agent didn’t do what I want. The most common failure in software is misalignment — you assume the agent understood you, then see what it built and realise it didn’t. The fix is a grilling session: make the agent interrogate you with detailed questions before any code is written, via
grill-meorgrill-with-docs. Pocock calls these his most popular skills. - The agent is way too verbose. Dropped into a project with no shared vocabulary, an agent uses twenty words where one would do. The fix is a ubiquitous language — a
CONTEXT.mdthat decodes the project’s jargon. His example: “there’s a problem with the materialization cascade” reads far better than the paragraph it replaces, and the concision “pays off session after session.” He calls this possibly “the single coolest technique in this repo.” - The code doesn’t work. Even when aligned, the agent flies blind without feedback. The fix is feedback loops — static types, browser access, and a red-green-refactor
tddskill, plus adiagnoseloop for hard bugs. - We built a ball of mud. Because agents accelerate coding, they also accelerate entropy. The fix is to care about design every day —
to-prdquizzes you on which modules you’re touching,zoom-outkeeps the agent reasoning about the whole system, andimprove-codebase-architecturerescues a codebase that’s drifted (he suggests running it every few days).
How it works
A skill is just a markdown file the agent loads when it’s relevant. The power is in how they compose: a typical flow moves from aligning on intent, to capturing it as durable artifacts, to building with feedback, to keeping the design honest over time. Nothing forces this sequence — you invoke each skill yourself — but they’re designed to hand off cleanly.
What keeps the collection coherent is the shared substrate. The grilling skills don’t just align you and the agent in the moment; grill-with-docs writes the hard-won decisions into a CONTEXT.md and architecture decision records, and the later skills (improve-codebase-architecture, triage) read those same files back in. The vocabulary you build once gets reused everywhere, which is also why Pocock argues a shared language reduces verbosity, makes the codebase easier to navigate, and even spends fewer tokens “on thinking.”
Getting started
Installation goes through skills.sh, the open agent-skills installer maintained by Vercel Labs. The README’s quickstart is a single command that detects your coding agents and lets you choose which skills to install where:
npx skills@latest add mattpocock/skills
The README is emphatic about one step: when the installer asks which skills you want, make sure you select setup-matt-pocock-skills. Then run /setup-matt-pocock-skills inside your agent. That setup skill scaffolds the per-repo configuration the engineering skills depend on — it asks which issue tracker you use (GitHub, Linear, or local files), what labels you apply when triaging tickets (triage runs on a state machine of labels), and where to save the docs the skills generate. It’s a once-per-repo step you run before to-issues, to-prd, triage, diagnose, tdd, improve-codebase-architecture, or zoom-out.
In practice
Aligning before building. You want to add a feature but the spec in your head is fuzzy. Instead of letting the agent guess, you run /grill-me (or /grill-with-docs for code work) and the agent relentlessly interviews you — surfacing assumptions, edge cases, and the reasoning behind interface choices — until every branch of the decision is resolved. With grill-with-docs, that conversation also sharpens the project’s terminology and writes the decisions into CONTEXT.md and ADRs as you go. From there, /to-prd turns the conversation into a PRD and files it as a GitHub issue with no further interview, and /to-issues slices a plan into independently grabbable tickets along vertical slices.
Building with feedback and rescuing architecture. With intent captured, /tdd builds the feature one vertical slice at a time on a red-green-refactor loop: write a failing test, stop and confirm it’s red, write the minimum code to make it pass, then refactor — with guidance baked in on what separates a good test from a bad one. When something breaks in a way that resists the obvious patch, /diagnose runs a disciplined loop — reproduce, minimise, hypothesise, instrument, fix, regression-test — rather than guessing at the symptom. And every few days you point /improve-codebase-architecture at the repo to find “deepening” opportunities (in John Ousterhout’s sense of deep modules behind simple interfaces), informed by the domain language in CONTEXT.md.
One more that’s easy to miss: caveman. It’s a productivity skill that switches the agent into an ultra-compressed communication mode, cutting token usage by roughly 75% by dropping filler while keeping full technical accuracy — a small, pragmatic lever rather than a methodology.
How it compares
The clearest contrast is with the process-owning frameworks the README names directly. GSD, BMAD, and Spec-Kit aim to drive the agent through a complete workflow; mattpocock/skills deliberately doesn’t. A useful neighbour in the Claude Code ecosystem is obra/superpowers, which installs a full, auto-triggering methodology — brainstorm, plan, build with TDD and subagent review. The two reflect a genuine philosophical split.
| Dimension | mattpocock/skills | Process frameworks (GSD, BMAD, Spec-Kit, Superpowers) |
|---|---|---|
| Scope | Small, single-purpose primitives | A complete, opinionated workflow |
| Control | You invoke each skill; you stay in the loop | The framework drives; less manual steering |
| Adaptability | Easy to read, fork, and rewrite | More integrated, but harder to alter mid-process |
| Best for | Engineers who want sharp tools, not a system | People who want the agent to follow a full method |
Both approaches have real strengths. A framework that owns the process buys you consistency and long autonomous runs; primitives buy you control and the ability to fix the process when it misbehaves — which is exactly the failure mode Pocock says drove him away from the frameworks.
A caveat on external coverage. A fair amount of third-party writing about this repo exists — explainer blogs, aggregator pages, and a few practitioner posts — but much of it reads as AI-generated SEO content, and the GitHub star counts I saw quoted ranged from roughly 54,000 to 135,000 across different sites on the same date. That spread is incoherent, so I’m not citing a figure. Treat secondhand metrics about this repo with suspicion; the README itself is the only source I’d fully trust, and it claims a newsletter audience of “~60,000 devs,” not a star count.
Performance
There are no published benchmarks here, and the README makes no quantitative claims beyond the caveman skill’s ~75% token reduction (its own description) and the general argument that a shared CONTEXT.md lets the agent “spend fewer tokens on thinking.” So the honest read is qualitative. The improvement these skills target is fewer wasted iterations: the grilling sessions front-load alignment so you discover misunderstandings before code is written rather than after, and the TDD and diagnose loops give the agent feedback so it stops flying blind. Whether that nets out faster depends entirely on your project and model — the README presents these as judgement-based practices, not a measured speedup, and I’d hold them to that standard.
Tradeoffs
- Primitives need a driver — you. The flip side of “you keep control” is that nothing fires automatically. You have to know which skill to reach for and when. People who want the agent to just take over a vague task may find that frustrating; people who were frustrated by frameworks taking over will find it the whole point.
- Some skills are setup-gated. The engineering skills assume you’ve run
/setup-matt-pocock-skillsfirst to configure your issue tracker, labels, and doc locations. Skip it and several of them won’t behave. - Opinionated in the small, even if not in the large.
tddmeans red-green-refactor TDD;triagemeans a label-based state machine. The individual skills encode Pocock’s specific practices, so adopting one is adopting a particular way of working — just at the granularity of a single task, not your whole process. - It’s one person’s daily kit. This is a strength (it’s battle-tested and coherent) and a limitation (it reflects one engineer’s preferences and a TypeScript-leaning worldview; a few misc skills are explicitly niche). Pocock’s framing — “make them your own” — is also a quiet admission that the defaults won’t fit everyone.
Takeaway
mattpocock/skills is the considered counterpoint to the “let the framework run everything” school of agent tooling. It’s a small, composable set of agent skills, drawn straight from one experienced engineer’s daily workflow, that targets the four failure modes that actually waste your time: misalignment, verbosity, broken code, and architectural rot. Reach for it if you want sharp tools you stay in control of and can fork freely; look at a full methodology like Superpowers instead if you’d rather the agent follow a complete process on its own. The one thing to remember is the bet at the centre of it: the way to get good engineering out of an agent isn’t a bigger process — it’s the old fundamentals, applied one deliberate step at a time.