Spec Kitty: A Governed Software Factory for AI Coding Agents
Hand a capable coding agent a real feature and the usual thing happens: it loses the requirement halfway through, forgets the acceptance criteria you agreed on three messages ago, and leaves you with a branch nobody can audit. Run two or three agents at once and you also get branch chaos. Spec Kitty, an open-source CLI from Priivacy.ai, is built to make that failure mode structural rather than accidental.
It is a spec-driven development tool that turns product intent into a repo-native workflow — spec -> plan -> tasks -> next -> review -> accept -> merge — and stores every artifact along the way (specs, plans, work packages, acceptance criteria, review state, merge decisions) in your Git repository rather than in an agent’s scrollback. Where it earns the “governed software factory” label is what it adds around that pipeline: work packages that move through lifecycle lanes, isolated git worktrees so multiple agents can implement in parallel, a local kanban dashboard, explicit governance commands, and a retrospective after every mission.
This post covers what Spec Kitty is, why the repo-as-source-of-truth idea matters, how the workflow and worktree model work, how to get started, concrete use cases and commands from the README, where it sits next to lighter alternatives, what “performance” honestly means for a process tool, and the tradeoffs. The facts here come from the project’s own README and documentation; the broader spec-driven-development context comes from third-party coverage, and I flag which is which.
What it is
Spec Kitty is an open-source (MIT-licensed) Python CLI — spec-kitty-cli, Python 3.11+ — for spec-driven development across AI coding agents and multi-agent workflows. You point it at a repository, define intent through guided specify, plan, and tasks steps, and it produces repo-native mission artifacts that agents implement inside traceable, isolated worktrees while reviewers accept, reject, or merge with an audit trail. It supports a long list of harnesses — Claude Code, Codex, Cursor, Gemini, GitHub Copilot, Windsurf, OpenCode, Qwen, Kiro, Vibe, Pi, and Letta among them — via slash commands or skills.
The framing the project itself uses is “software factory, not a black box.” Humans define intent, architecture, and acceptance criteria; agents implement; reviewers gate. It is local-first — core artifacts live in your repo, and any hosted tracker or sync integration is opt-in. By the project’s own admission it is “probably overkill for one-off edits, tiny scripts, or teams that do not use Git.”
Why it matters
Spec-driven development is having a moment because it addresses the trust problem directly: it makes the specification the artifact that agents execute against, verify against, and report against (Augment Code). Spec Kitty’s particular bet is that the spec, and everything that follows from it, belongs in the repository.
- The repo stays the source of truth. Specs, plans, tasks, review state, and merge decisions are committed artifacts under
kitty-specs/, not ephemeral chat context — so a session can crash, an agent can be swapped out, and the mission survives. - Parallelism without branch chaos. Each work package runs in its own git worktree under
.worktrees/, which is what lets multiple agents implement at once without stepping on each other. - Review is a gate, not an afterthought. The runtime loop ends in explicit
review -> accept -> mergesteps with an audit trail, rather than trusting whatever the agent produced. - Governance lives in the repo too. Doctrine, guidelines, and warnings are surfaced through
advise,ask, anddocommands rather than buried in prompt text. - Missions teach the next mission. Every completed mission generates a retrospective by default, so the factory can improve its own operating procedures.
How it works
The core is a linear runtime loop. You author intent at the front, the runtime drives the middle, and humans gate the end.
spec, plan, and tasks; the runtime picks the next action until a work package is ready; humans gate review, accept, and merge.The second idea that makes Spec Kitty more than a prompt template is the mission-and-worktree model. A mission is broken into work packages, and each work package moves through lifecycle lanes — planned, in_progress, for_review, approved, done. The packages that are eligible to run get their own isolated git worktree under .worktrees/, so independent agents can implement in parallel against the same repo without colliding. The local dashboard (spec-kitty dashboard) renders this as a kanban board so you can see where every package sits.
kitty-specs/ splits into work packages that flow through lifecycle lanes; the active ones get isolated worktrees under .worktrees/ so agents run in parallel.Layered on top is the governance trail. The README describes three commands that map operator intent to runtime behavior: spec-kitty advise surfaces relevant doctrine, guidelines, and warnings for the current context; spec-kitty ask queries the knowledge base for specific guidance; and spec-kitty do executes governed actions while enforcing compliance with the trail model. The point is to keep runtime governance in the repository rather than treating it as agent-only prompt text.
Getting started
The CLI ships on PyPI. The project recommends pipx because it isolates the tool in its own virtual environment and avoids the externally-managed-environment errors common on modern Linux.
pipx install spec-kitty-cli
# or
uv tool install spec-kitty-cli
Then initialize a project against your agent of choice and verify the wiring:
spec-kitty init my-project --ai claude
cd my-project
spec-kitty verify-setup
Swap claude for codex, cursor, gemini, copilot, opencode, windsurf, or any other supported agent key. From inside your agent, you drive the front of the workflow with slash commands:
/spec-kitty.charter
/spec-kitty.specify Build a small task list app.
/spec-kitty.plan
/spec-kitty.tasks
In practice
The everyday rhythm is: author intent, let the runtime advance the mission, then gate the result. After specify, plan, and tasks, you hand control to the runtime to pick the next action until a work package is ready, naming the agent and the mission:
spec-kitty next --agent claude --mission <mission-slug>
When a package is ready, you close the loop through the review gates — and merge can push:
/spec-kitty.review
/spec-kitty.accept
/spec-kitty.merge --push
After merge, /spec-kitty-mission-review produces the mission’s retrospective.yaml, and spec-kitty retrospect summary gives the cross-mission view so lessons compound across missions.
Where this shines, per the project’s stated use cases: replacing ad hoc “vibe coding” with a repeatable workflow; turning GitHub issues, product requirements, or bug reports into executable work packages; and coordinating multiple AI agents without losing context between sessions. The multi-agent case is the headline one — if you are genuinely running parallel Claude Code, Codex, and Cursor work, the worktree isolation under .worktrees/ and the lane-based dashboard are the features that make that tractable instead of terrifying.
How it compares
Spec Kitty did not appear in a vacuum. The README is explicit that it is inspired by spec-driven workflows in the spirit of GitHub’s Spec Kit, but adds repo-native mission state, work-package lanes, worktree isolation, a dashboard, governance commands, and an explicit next -> review -> accept -> merge runtime loop on top of the basic spec-then-plan idea.
The more useful contrast is with OpenSpec (Fission-AI/openspec). OpenSpec is the lighter, more fluid option: a minimal SDD framework that separates current-state specs from active change proposals and is particularly well suited to brownfield work, where regenerating a full technical plan for a small change in a mature codebase would be overkill (Augment Code, Hashrocket). Spec Kitty sits at the heavier, governed end of the same spectrum — it brings ceremony (lanes, worktrees, review gates, retrospectives) precisely because it is aimed at multi-agent factories rather than quick incremental edits.
| Dimension | Spec Kitty | OpenSpec |
|---|---|---|
| Weight | Heavier, governed | Lightweight, fluid |
| Best fit | Multi-agent factories, greenfield missions | Brownfield, day-to-day incremental change |
| Parallel agents | First-class: isolated .worktrees/ | Not the focus |
| Governance | Explicit advise/ask/do + review gates | Minimal, proposal-driven |
| Overhead on small tasks | High (by design) | Low |
If you want the head-to-head in depth, see the dedicated comparison: Spec Kitty vs OpenSpec. The honest one-liner is that they optimize for different jobs — OpenSpec for surgical changes to existing code, Spec Kitty for coordinating multiple agents through a governed lifecycle.
Performance
For a workflow tool, “performance” is not a benchmark number, and Spec Kitty does not publish one — so I will not invent one. The gains it claims are qualitative and process-shaped: requirements and acceptance criteria stop getting lost between sessions because they live in the repo; multiple agents can work at once because each gets an isolated worktree; and review, test, and merge decisions stay visible instead of hidden inside an agent’s run. Whether that nets out faster than just prompting an agent depends entirely on the work. On a one-line fix, the ceremony is pure overhead. On a multi-package mission run by several agents, the parallelism and the lack of lost context are exactly where the time is saved. Treat any speed claim — including the project’s own framing — as a statement about discipline and coordination, not a measured throughput figure.
Tradeoffs
- It is heavy on purpose. Lanes, worktrees, governance commands, and per-mission retrospectives are real machinery. The project itself calls it overkill for one-off edits and tiny scripts, and that is the right read.
- Git is mandatory. The whole model — worktrees, repo-native artifacts, merge gates — assumes Git. Teams that do not use Git are out of scope.
- A learning curve precedes the payoff. Charter, specify, plan, tasks, next, review, accept, merge, retrospect is a lot of surface to internalize before the workflow feels natural.
- Maturity and adoption are unproven externally. Beyond the project’s own README and docs, I found very little independent, hands-on coverage of Spec Kitty specifically — the spec-driven-development discourse online is mostly about OpenSpec and Spec Kit. That is not a knock on the tool; it is a niche, governance-focused entrant, and you should evaluate it on its design rather than on a crowd.
- The governed model is the default, not the only mode. It can support “dark” autonomous factories, but human-in-the-loop review is the intended posture — which is a feature if you want auditability and friction if you wanted lights-out automation.
Takeaway
Spec Kitty is what spec-driven development looks like when you take “governed software factory” literally: the repository is the source of truth, work packages move through visible lanes, agents run in parallel inside isolated worktrees, and nothing merges without passing a review gate. Reach for it when you are coordinating multiple AI agents on real missions and you need traceability, parallelism, and a workflow that survives a dead session. Skip it for one-off edits and small scripts, where a lighter tool like OpenSpec — or just prompting your agent — will get you there with far less ceremony. The one thing to remember: Spec Kitty trades upfront process for durable, auditable coordination, and that trade only pays off when the coordination problem is real.