Agent Skills: Bolting Senior-Engineer Discipline Onto Your Coding Agent

Ask an AI coding agent for a feature and it writes the feature. It does not usually stop to ask whether you have a spec, write a failing test before the implementation, check whether the change crosses a trust boundary, or consider what the diff will look like to a reviewer. It produces code, declares victory, and moves on. That is the same failure mode every senior engineer spends a career learning to avoid — and it is precisely the part of the job that never shows up in the diff.

Agent Skills, by Addy Osmani (a software engineer at Google, longtime web-performance author, and prolific writer on AI-assisted coding), is an attempt to bolt that senior-engineer scaffolding back on. It is an MIT-licensed, open-source pack of markdown skills that encode the workflows, quality gates, and best practices a senior engineer applies — and packages them so an agent follows them consistently across every phase of development.

This post covers what it is, how the skill mechanism works, how to get started, what you can do with it, how it compares to alternatives, and the honest tradeoffs. The framing throughout is Osmani’s own: he has written it up in detail on his blog (also republished on O’Reilly Radar), and that piece is the main source of opinion here, clearly separated from the repo’s factual claims.

What it is

Agent Skills is a library of structured engineering workflows for AI coding agents, shipped as plain markdown. Per the repo’s README, it bundles 23 skills (22 lifecycle skills plus one meta-skill), 7 slash commands, 3 specialist agent personas, and 4 reference checklists, all organized around the software development lifecycle: Define, Plan, Build, Verify, Review, and Ship.

The category it belongs to is “agent skills” in the Claude Code / Anthropic sense. As Osmani puts it, a skill is “a markdown file with frontmatter that gets injected into the agent’s context when the situation calls for it” — somewhere between a system-prompt fragment and a runbook. Crucially, it is not reference documentation; it is a workflow with steps, checkpoints that produce evidence, and a defined exit criterion. The problem it solves is that agents default to the shortest path to “done,” skipping the specs, tests, and reviews that make software reliable. It is aimed at anyone driving an AI coding agent who wants production discipline rather than prototype output.

Why it matters

The case Osmani makes is that frontier models are extremely capable junior engineers with no instinct for the parts of the job that do not show up in the diff. A model has read the phrase “Hyrum’s Law” in its training data, but it will not apply Hyrum’s Law when it is designing your API at 3am unless something forces it to. Agent Skills is the forcing function.

It replaces “just start coding” with a real lifecycle — define, plan, build, verify, review, ship — instead of jumping straight to an implementation.
It encodes published engineering practice rather than vibes. The skills are saturated with concepts from Software Engineering at Google and Google’s public engineering practices guide.
It is portable. The same SKILL.md works in Claude Code, Cursor, Gemini CLI, Codex, Windsurf, OpenCode and anything else that accepts system-prompt content, so you write the workflow once.
It is useful even if you never install it. The skills are a readable description of what good engineering with agents looks like, which makes them a spec you can lift practices from by hand.

How it works

Every skill follows a consistent anatomy: frontmatter (a hyphenated name and a description that says when to use it), then an Overview, a “When to Use” section, a step-by-step Process, a Rationalizations table, Red Flags, and a Verification section. The design choices behind that anatomy are what make it more than a folder of pretty markdown.

Four ideas do most of the work. Process over prose: skills are workflows the agent executes, not essays it reads and then ignores. Anti-rationalization tables: each skill lists the excuses an agent (or a tired engineer) uses to skip a step — “I’ll write tests later,” “this task is too simple to need a spec” — each paired with a written rebuttal, because LLMs are excellent at generating plausible justifications for cutting corners. Verification is non-negotiable: every skill terminates in concrete evidence — tests passing, clean build output, a runtime trace — and “seems right” is never sufficient. Progressive disclosure: the skills are not all loaded at session start. A meta-skill, using-agent-skills, acts as a router that decides which skill applies to the current task and pulls in only what is relevant, keeping context lean.

The 7 slash commands are the entry points to the lifecycle: /spec, /plan, /build, /test, /review, /code-simplify, and /ship. Skills also activate automatically based on what you are doing — designing an API triggers api-and-interface-design, building UI triggers frontend-ui-engineering. The scope of activation matches the scope of the work: per Osmani, a complex feature might activate eleven skills in sequence, while a small bug fix uses three.

The lifecycle Agent Skills encodes. A meta-skill routes work to the relevant skills, the six phases run in order, and every step ends in a verification gate. Phases and commands per the project README.

Getting started

The recommended path is the Claude Code marketplace, which is a two-line install. You add the marketplace, then install the plugin:

/plugin marketplace add addyosmani/agent-skills
/plugin install agent-skills@addy-agent-skills

After that there is nothing to invoke by hand: you get the slash commands and the agent activates the relevant skills automatically based on context. The README notes that the marketplace clones over SSH, so if you lack SSH keys on GitHub you can pass the full HTTPS URL (/plugin marketplace add https://github.com/addyosmani/agent-skills.git) to force HTTPS cloning.

Because the skills are plain markdown, other tools each have their own path. Cursor users copy a SKILL.md into .cursor/rules/ or reference the whole skills/ directory; Gemini CLI installs them as native skills (gemini skills install https://github.com/addyosmani/agent-skills.git --path skills); Windsurf, OpenCode, GitHub Copilot, Kiro, and Codex are covered in the repo’s per-tool setup docs.

In practice

Osmani describes three modes of use, in increasing commitment, and they map cleanly onto how you would actually reach for this.

The first is the full marketplace install described above, where you let the lifecycle drive: you describe a feature, /spec clarifies what you are building, /plan breaks it into small atomic tasks, /build implements one vertical slice at a time, /test proves it works, /review catches what slipped, and /ship gets it out. A concrete example of the discipline in action is the testing skill, test-driven-development, which enforces Red-Green-Refactor, a roughly 80/15/5 test pyramid, and “DAMP over DRY” so tests read like a specification — the kind of process humans know they should follow and skip under pressure.

The second is dropping individual skills into whatever tool you already use, treating the markdown as portable rules. The third, which Osmani says is where he would actually start, is to read the skills as a spec even if you never install anything: open code-review-and-quality and apply its five-axis framework to your team’s review process, or lift the meta-skill’s five non-negotiables — surface assumptions before building, stop and ask when requirements conflict, push back when warranted, prefer the boring solution, and touch only what you are asked to touch — straight into your own AGENTS.md. The repo also ships three specialist personas (code-reviewer, test-engineer, security-auditor) for targeted reviews.

How it compares

Agent Skills sits a level above the model: it is process, not capability. The obvious alternatives are the raw agent, your own hand-maintained rules file, and other skill packs. Each has real strengths.

Approach	What you get	Effort
Raw coding agent	Capable, but takes the shortest path to “done”	none
Hand-rolled `AGENTS.md` / rules	Exactly your conventions, but you write and maintain them	high, ongoing
Agent Skills	A lifecycle-spanning pack grounded in Google practices, auto-activated	one install
Other skill packs (e.g. Superpowers)	Varies; a different opinionated methodology	one install

The closest comparison in spirit is a pack like obra/superpowers, which similarly ships a development methodology as auto-triggering skills. The honest framing is that these are different opinionated takes rather than strict competitors: Superpowers leans hard on a brainstorm-then-TDD pipeline with subagent review, while Agent Skills is explicitly organized around the SDLC and Google’s documented engineering norms. If you already have a refined workflow encoded in your own config you may not need either; most people do not, which is the gap both fill.

Performance

The repo does not publish benchmarks, and neither Osmani’s writeup nor any source I found offers measured numbers on output quality or speed — so there are no figures to cite here, and I will not invent any. The claimed improvement is qualitative and structural: by forcing the agent through specs, tests, scope discipline, and verification gates it would otherwise skip, the output is meant to look more like production-quality work and less like a prototype. Osmani argues the effect compounds for long-running agents, where every skipped step is amplified over a long session, and that progressive disclosure keeps the whole library usable without flooding the context window. Treat all of that as a design rationale to evaluate on your own work, not a measured result.

One data point worth stating carefully because it is a popularity signal, not a quality one: as of his blog post Osmani wrote that the repo had “just crossed 27K stars.” Star counts are not a performance metric, and the live number will differ from any figure quoted here.

Tradeoffs

It is opinionated by design. You get Osmani’s lifecycle and the Google practices baked into it; if you dislike mandatory specs or test-first discipline, the structure is friction rather than help. For a one-line fix, running the full define-plan-build-verify-review-ship ceremony is more process than the task warrants — the value shows up on real features, not typos.

There is also an enforcement gap to be honest about. Skills are instructions an agent is meant to follow, and Osmani is candid that LLMs are good at rationalizing their way out of work — which is exactly why the anti-rationalization tables exist. They reduce skipping but cannot guarantee it; hooks and deterministic checks, which Osmani frames as a separate harness layer, are what actually enforce. A smaller, version-dependent annoyance: the README notes the Claude Code marketplace clones via SSH, so you need SSH keys configured or the HTTPS workaround. Finally, note that Osmani’s blog cites 20 skills while the current README lists 23; the project is evolving, so confirm the exact count against the repo when you install.

Takeaway

Agent Skills is the answer to “my coding agent is capable but undisciplined.” It packages the senior-engineer parts of the job — specs, tests, reviews, scope discipline, verification — into portable markdown workflows organized around the development lifecycle and grounded in Google’s published practices, then activates them based on what you are doing. Reach for it when you want an agent to behave like a careful engineer on real features; skip the ceremony for trivial edits. And even if you never install it, the most valuable thing on offer is the framing Osmani keeps returning to: the senior-engineer parts of the job are no longer optional, even when the engineer is a model.