pi: A Minimal, Hackable Coding Agent for the Terminal

The terminal coding-agent space filled up fast: Claude Code, opencode, Codex CLI, Gemini CLI, Aider, and more. Most of them compete by adding features. pi competes by removing them.

pi is a deliberately minimal harness built on a simple bet: that the right core is small, and everything else — sub-agents, plan mode, MCP, to-do lists — should be something you add, not something the tool forces on you. This post covers what pi is, how to run it, what you can do with it, how it stacks up against opencode and Claude Code, and what the benchmark picture really looks like (including pi’s pointed refusal to play that game).

What it is

pi (earendil-works/pi, from Earendil Inc., created by Mario Zechner) is an MIT-licensed, interactive coding agent for the terminal, plus the libraries underneath it: a unified multi-provider LLM API (pi-ai), an agent runtime with tool-calling and state (pi-agent-core), and a terminal UI library (pi-tui). The headline package is pi-coding-agent — the CLI you actually talk to.

Out of the box it gives the model a handful of tools — read, write, edit, bash, grep, find, ls — and a clean TUI with sessions, branching, and context management. Everything beyond that is opt-in. The project’s pitch sums it up: “there are many agent harnesses, but this one is yours — adapt pi to your workflows, not the other way around.”

Why it matters

Model-agnostic, broadly. pi speaks to a long list of providers (Anthropic, OpenAI, Google, DeepSeek, Groq, Mistral, xAI, OpenRouter, local gateways, and more), via either an API key or an existing subscription (Claude Pro/Max, ChatGPT Plus/Pro, Copilot).
Genuinely hackable. Extend it with TypeScript extensions, skills (the open Agent Skills standard), prompt templates, and themes — and bundle them into shareable pi packages over npm or git. You can literally add custom tools, sub-agents, plan mode, git checkpointing, sandboxed execution — or, per the docs, “make pi look like Claude Code.”
Scriptable. Beyond interactive mode there’s print/JSON output, an RPC mode for process integration, and an SDK for embedding pi in your own app.
Strong session model. Sessions are JSONL trees: branch, fork, and clone in place, with automatic compaction when you approach the context limit.
Context engineering, not just chat. A deliberately minimal system prompt (which also keeps it token-efficient), plus fine-grained control over what enters the context window: AGENTS.md project instructions, a per-project SYSTEM.md, customizable compaction (topic-based, code-aware, or a different summarization model), and skills with progressive disclosure that don’t bust the prompt cache.

How it works

pi is a harness: it wires a chat model to tools and a loop. You type a request; the model decides which of its tools to call (read a file, run a bash command, edit code); pi executes them, feeds results back, and repeats until the task is done. The TUI shows messages, tool calls, token/cost usage, and the current model.

What’s interesting is the philosophy of omission. pi ships without several things its competitors treat as table stakes, on purpose:

No MCP by default — write CLI tools with READMEs (as skills), or add MCP via an extension.
No sub-agents, no plan mode, no built-in to-dos — build them as extensions or install a package.
No permission popups, no background bash — run in a container, or use tmux.

The argument is that a minimal core stays predictable and out of your way, and the extension system lets you (or the community) add exactly the workflow you want without forking.

Getting started

It’s an npm package — install globally:

npm install -g --ignore-scripts @earendil-works/pi-coding-agent
# or the installer script
curl -fsSL https://pi.dev/install.sh | sh

Authenticate with an API key or a subscription, then run pi:

export ANTHROPIC_API_KEY=sk-ant-...
pi
# or: launch pi, then /login to use a subscription

That’s it — you’re in the TUI. Switch models with /model (or Ctrl+L), adjust reasoning with the thinking levels, and manage history with /tree, /fork, and /compact.

In practice

A few things you can do without leaving the prompt:

# one-shot, non-interactive
pi -p "Summarize this codebase"

# pipe content in
cat README.md | pi -p "Summarize this text"

# read-only review (no write/edit/bash)
pi --tools read,grep,find,ls -p "Review the code for bugs"

# pick a different provider/model on the fly
pi --model openai/gpt-4o "Help me refactor this module"

# install a community package of tools/skills
pi install npm:@foo/pi-tools

The extensibility is the real story: drop a SKILL.md in ~/.pi/agent/skills/, a prompt template in prompts/, or a TypeScript extension that registers a new tool — and pi picks them up. Don’t want to build it? Ask pi to write the extension for you.

How it compares

pi, opencode, and Claude Code are all terminal coding agents, but they sit at different points on the “minimal vs batteries-included” and “open vs proprietary” axes.

Dimension	pi	opencode	Claude Code
Nature	minimal, hackable harness	feature-rich OSS agent (TUI + desktop + IDE)	polished proprietary CLI
License	MIT	MIT	proprietary (Anthropic)
Models	many providers / subs	many providers / local	Claude models only
Built-in plan mode / sub-agents	no (add via extensions)	plan mode built in	yes, built in
Extensibility	core design goal (TS extensions, skills, packages)	good, but more opinionated	hooks, skills, MCP, subagents
Best when	you want to shape your own agent	you want a full OSS agent today	you want the best-in-class Claude experience

opencode (from the SST/Anomaly team) is the popular open-source heavyweight — 160k+ stars, a TUI plus desktop and IDE surfaces, provider-agnostic, with a built-in plan mode. It’s the “give me a complete, open agent now” option. Claude Code is Anthropic’s proprietary CLI: tightly integrated, very capable out of the box, and consistently near the top of the leaderboards — but it’s closed and runs Claude models only.

The honest framing: pick pi if you want a small core you can mold; opencode if you want an open, feature-complete agent without assembling it; Claude Code if you want the most polished, highest-scoring experience and you’re happy inside Anthropic’s models and licensing.

Performance and benchmarks

This is where you have to be careful, because these tools are harnesses — the score is really the agent + model pair, not the harness alone. The reference benchmark for terminal agents is Terminal-Bench, which checks whether an agent completes a real terminal task end to end.

On the public Terminal-Bench 2.1 leaderboard (as of mid-June 2026):

Agent + model	Score
Codex CLI + GPT-5.5	83.4%
Claude Code + Fable 5	83.1%
Claude Code + Opus 4.8	78.9%
Gemini CLI + Gemini 3.1 Pro	70.7%

There’s also the Artificial Analysis Coding Agent Index, a composite (DeepSWE + Terminal-Bench v2 + SWE-Atlas-QnA) computed across harnesses like Claude Code, Cursor CLI and opencode, and the long-running SWE-bench family for repository-level fixes.

Two honest caveats:

pi isn’t on these leaderboards. It’s newer and minimal, and — pointedly — its author argues against exactly this kind of measurement. pi’s README calls them “toy benchmarks” and instead publishes real open-source coding sessions as a public dataset (badlogicgames/pi-mono on Hugging Face), on the view that real-world task/tool-use/failure data is the more useful signal. Whether you buy that or not, it means pi has no headline number to quote.
The harness matters less than the model. Because pi, opencode and Claude Code can all drive the same frontier models, a lot of the leaderboard spread comes from the model, the scaffolding around tool-calls, and the prompt — not from which logo is on the CLI. Run any of them with a top model and you’re in the same ballpark for everyday work.

Tradeoffs

Minimal means assembly required. The flip side of “no plan mode, no sub-agents, no MCP by default” is that if you want those, you install or build them. It’s softer than it sounds — pi ships 50+ example extensions (sub-agents, plan mode, permission gates, SSH execution, sandboxing) you can copy, and there are installable packages — so it’s rarely from scratch. But for people who want everything working on day one with zero wiring, opencode or Claude Code are less effort.
Younger and smaller ecosystem. Fewer packages, fewer integrations, and a smaller community than the established players (and new issues/PRs from new contributors are auto-closed by default — a deliberate maintenance choice that can surprise drive-by contributors).
You bring the model (and the bill). Like any harness, pi is only as good as the model behind it, and you pay for that model via API or subscription.
No benchmark to point at. If your team needs a number to justify a tool, pi’s “real sessions, not toy benchmarks” stance won’t give you one.

Takeaway

pi is the coding agent for people who’d rather own their tooling than adopt someone else’s defaults — a small, MIT-licensed core that you extend into exactly the agent you want, across almost any model provider. If that sounds like more setup than you want, opencode gives you an open, full-featured agent today, and Claude Code gives you the most polished, top-scoring experience inside Anthropic’s ecosystem. And if you care about leaderboards, remember what they actually measure: the model and the scaffolding, more than the name on the CLI.