Powering Up Claude Code: A Practical Guide to Dev Skills

If you use Claude Code with one skill (say superpowers) and wonder what else to add for review, security, code quality and multi-agent work, the honest first answer is: fewer than you think, and measure before you trust any number. Most skills have no benchmark; a few have self-reported ones with caveats. This guide explains what a skill actually is, how Claude Code loads it (and what it costs you in context), the real risks, and a lean stack worth installing — with how to install, remove and update each.

What a skill is (and what it’s for)

A skill is a SKILL.md file: YAML frontmatter with a name and description, then markdown instructions the agent follows when the skill is relevant. That’s the whole format. It exists to give the agent reusable, specialized behavior — “review like a security engineer”, “write the minimum that works”, “follow our PR conventions” — without you re-explaining it every session. Skills follow a shared Agent Skills spec, so the same file works across Claude Code and 50+ other agents.

A skill is instructions, not code and not a tool. It steers the model; it doesn’t add capabilities the model couldn’t otherwise do. That distinction matters for both the benefits and the risks.

How Claude Code uses them — and the context cost

This is the question people skip, so be precise about it. Skills use progressive disclosure:

At startup, Claude Code loads only each skill’s name + description (a line or two each) so the model knows what’s available. That standing cost is small and scales with how many skills you have installed.
The full SKILL.md body is pulled into context only when the skill is triggered — so an unused skill costs almost nothing, and a used one costs its body size (a few lines for a tight rule like Ponytail; hundreds for a fat persona file).
The exception is always-on context — text injected on every turn instead of loaded on demand. Some skills install themselves this way, and it’s also how project rule files like AGENTS.md work. The cost is constant: even a small always-on file rides along on every message, so it’s a permanent tax rather than pay-per-use.

What is AGENTS.md? It’s a plain-markdown instructions file the agent reads at the start of every session — your project’s conventions, common commands, and do’s and don’ts. It’s an open cross-agent convention (agents.md): Codex, Cursor, OpenCode, pi and others read AGENTS.md, while Claude Code’s own equivalent is CLAUDE.md — at the repo root and/or globally in your home directory:

Pros: the agent follows your rules consistently with zero re-explaining, and committing it gives your team one portable source of truth.
Cons: it’s always-on, so it’s a constant context cost and, as it grows, it dilutes attention on every turn — and it can go stale or quietly conflict with your skills. Keep it short and high-signal.

Practical takeaways: installing 50 on-demand skills is cheap until you use them; a handful of always-on rules is a permanent tax; and a giant roster has a real failure mode — some agents silently cap how many skills they register (agency-agents’ README reports OpenCode dropping skills past ~119). Exact token counts depend on the skill, so treat “context cost” as standing metadata + body-when-used + any always-on rules, not a single number.

Seeing what’s loaded (the `/skills` panel)

A catch worth knowing: Claude Code gives you no per-skill “loaded” banner. When a skill fires, the transcript shows only a generic Skill tool use — and remember the distinction from above, a skill being available (name + description in context) is not the same as being invoked (its body pulled in). So “I don’t see my skill” usually means it just wasn’t triggered, not that it’s gone.

The reliable check is the /skills panel. It lists every installed skill with its scope, its token cost, and its on/off state — and two markers tell you who controls each:

🔒 lock (“locked by plugin”) — the skill comes from an installed plugin (e.g. every superpowers:* entry). It’s active, but you can’t toggle it here; manage it via /plugin (the panel even footnotes “Plugin skills are managed via /plugin”).
✔ green check — a standalone skill, either project scope (.claude/skills/) or user/global scope (~/.claude/skills/). These you toggle on/off directly in the panel with Space.

So the lock doesn’t mean “disabled” — it means “owned by a plugin.” Useful companions: /reload-skills after editing a skill, and ls ~/.claude/skills/ / ls .claude/skills/ to confirm what’s actually on disk.

If a skill used to announce itself at session start and now doesn’t, that announcement was almost certainly a SessionStart hook (from the plugin) that stopped firing — a disabled plugin or a settings-merge quirk are common causes. But /skills is the source of truth: if it’s listed and on, it’s installed and will activate when relevant. (Exact panel layout and per-skill token figures are what current Claude Code shows, and may shift between versions.)

The risks of skills

Because a skill is third-party instructions your agent will follow, the risks are real and under-discussed:

Supply chain. Installing a skill is running someone’s instructions in your loop. A malicious or careless one can tell the agent to exfiltrate, run commands, or follow bad patterns. Auto-install (-g -y) is npm install from a stranger — fine for trusted sources, worth a read otherwise.
Prompt injection surface. Skills (and the content they pull in) are another channel for injected instructions.
Conflicting instructions. Stack three “best practice” skills and they’ll contradict each other; the model picks one unpredictably. More is not better.
Popularity ≠ quality ≠ safety. Install counts and stars are proxies. Vet the source and read the SKILL.md, especially anything that runs shell.
Context dilution. Every always-on rule competes for attention with your actual task.

A lean dev stack

Ordered by ROI. Start with the first three; add the rest only when a gap actually bites.

Security-Guidance plugin (Anthropic, official)

For: catching common vulns (injection, unsafe deserialization, insecure DOM APIs) as you write.
How it works: three layers — instant pattern match on every edit, model review at end of turn, deeper agentic review on commit/push.
Benchmark: Anthropic reports a 30–40% drop in security-related PR comments in their internal rollout. Note the caveats: it’s vendor-reported, and “fewer comments” is a proxy, not “fewer shipped vulnerabilities.”
Context: the instant layer runs without model calls (no token cost); the review layers cost a normal review pass.
Pros: free, official, low effort, real first-pass value.
Cons: first-pass only — not a substitute for a real audit.
Install: enable it from Claude Code’s official plugins; it needs a recent Claude Code (2.1.144+ at launch) and Python 3.8+ — see Anthropic’s docs for the current steps and requirements.

Ponytail (code quality / anti-over-engineering)

For: stopping the agent from over-building — pulling deps and abstractions for one-line problems.
Benchmark: the author’s agentic benchmark (real Claude Code editing a FastAPI+React repo, n=4, Haiku 4.5) reports ~−54% code, −22% tokens, −20% cost, −27% time, 100% safety. Caveats are disclosed by the project: model-dependent (on a terse reasoning model it can raise tokens), small n, one repo.
Context: ships an always-on ruleset, but it’s tiny (a short ladder) — a small constant cost.
Pros: real measured upside, keeps safety guards.
Cons: does nothing on already-minimal code; one more always-on rule.
Install: /plugin marketplace add DietrichGebert/ponytail then /plugin install ponytail@ponytail (two separate prompts); remove with /plugin remove ponytail. (Ponytail also ships adapters for other agents — see its README.)

A code reviewer

For: a consistent review pass before the PR (security review is already covered above).
Options: Claude Code’s built-in subagents (zero install — define a “reviewer” subagent), or the Code Reviewer persona from agency-agents.
Benchmark: none. The value is qualitative — catching issues earlier and reviewing consistently.
Context: a persona file is loaded only when you invoke the reviewer.
Pros: earlier feedback.
Cons: a “reviewer persona” is just a prompt — no guarantee it beats a careful manual review prompt. Don’t expect magic.
Install — option A (recommended): no package needed. Create a markdown subagent file in ~/.claude/agents/ (e.g. code-reviewer.md) describing the reviewer’s role; Claude Code loads it automatically.
Install — option B (agency-agents’ persona): it’s not an npx skills package. Clone the repo and run its own installer:

git clone https://github.com/msitarzewski/agency-agents
cd agency-agents
./scripts/install.sh --tool claude-code --agent code-reviewer

One discipline skill (pick one)

For: general best-practice steering. Options: andrej-karpathy-skills, Matt Pocock’s skills, Addy Osmani’s agent-skills.
Critical note: these overlap heavily with each other, with Ponytail, and with superpowers. Installing several creates conflicting instructions and context bloat. Choose one and delete the rest.
Benchmark: none.
Install (andrej-karpathy-skills): it’s a CLAUDE.md, not an npx skills package — either the plugin (/plugin marketplace add forrestchang/andrej-karpathy-skills then /plugin install andrej-karpathy-skills@karpathy-skills) or per-project: curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md.

Multi-agent (only if you need it)

For: parallel, role-split work. Start with Claude Code’s native subagents — no install. Step up to spec-kitty (governed multi-agent worktrees) only for genuinely large, parallel efforts.
Caution: superpowers already orchestrates a methodology; stacking spec-kitty + agency-agents multi-agent personas on top risks overlapping control loops. Add one layer, not three.
Benchmark: none.

find-skills (discovery)

For: letting the agent find and vet new skills on demand instead of memorizing catalogs.
Install: npx skills add vercel-labs/skills -s find-skills -g.

Installing, removing, updating

There are three install mechanisms, and it pays to know which one a tool uses: the skills CLI (spec-compliant skills), Claude Code plugins (/plugin), and a few tools’ own installers (e.g. agency-agents’ install.sh, or a plain CLAUDE.md). For the skills CLI (vercel-labs/skills) — project scope (default) commits with the repo, global (-g) applies everywhere:

npx skills add owner/repo -a claude-code -g     # install (whole repo)
npx skills add owner/repo -s skill-name -g      # install one skill
npx skills list                                 # see what's installed
npx skills find <query>                          # search the ecosystem
npx skills update [name]                          # update (all, or one)
npx skills remove [name]                          # remove (interactive if no name)

Plugins (the Security-Guidance plugin, Ponytail, the karpathy plugin) are managed by Claude Code’s /plugin marketplace instead — /plugin install <name>@<marketplace> and /plugin remove <name>. Repo-specific installers (agency-agents) and plain CLAUDE.md files you manage by hand: re-run the installer or git pull to update, delete the files to remove. The golden rule: prune regularly, because unused always-on rules are pure context tax.

Tradeoffs

The single biggest risk isn’t a bad tool — it’s over-installing: redundant discipline skills, three multi-agent layers, always-on rules competing for attention. A lean stack beats a maximal one.
Almost none of this is benchmarked on your code. The only honest number is the one you measure yourself: escaped bugs, review time, tokens and cost before vs. after.
Skills are instructions, so trust and review matter more than with a normal dependency. Read what runs in your loop.

Takeaway

Add to superpowers in this order: the Security-Guidance plugin (real, free, low-effort), Ponytail (the rare skill with measured upside), a reviewer (native subagent or a persona), one discipline skill, and find-skills so the agent discovers the rest. Reach for multi-agent only when the work is genuinely parallel, and prune anything you don’t use. Then benchmark on your own repo — because that’s the only performance number that’s actually yours.

Sources

Claude Code — security guidance (docs) · Automated security reviews (Help Center) · Help Net Security — 30–40% fewer security PR comments
vercel-labs/skills (the skills CLI) · Agent Skills spec · Claude Code skills docs
Benchmarks (self-reported, with caveats in their repos): Ponytail · trailofbits/skills