inwo inwo.
← all posts

Caveman: Why Use Many Token When Few Do Trick

May 1, 2026 · Shingo Nakamura · AI

Coding agents are verbose by default. Ask Claude why a React component re-renders and you often get a warm-up sentence, a paragraph of context, a hedge, and then — eventually — the fix. Every one of those words is an output token: it costs money, it takes time to generate, and it buries the answer you actually wanted.

caveman is a small, one-line-install skill built around a blunt idea: why use many token when few do trick. It makes the agent drop the fluff and answer like a caveman — terse, fragmentary, no pleasantries — while keeping every bit of technical substance. Same fix, a fraction of the words.

This post covers why you’d want that, how it actually works, how to install it on Claude Code (and what to do for the Claude desktop app), and the real token numbers the project measured.

Why it matters

The pitch is not “make Claude dumber.” It’s “make Claude stop padding.” The benefits the project leans on:

  • Speed. Fewer output tokens to generate means responses finish faster — generation time scales with tokens produced.
  • Readability. No wall of text. You get [thing] [action] [reason] and the next step, not three sentences of preamble.
  • Cost. Output tokens are billed; cutting them cuts the bill on agent-heavy workflows.
  • Accuracy is preserved. Only filler is removed — articles, hedging, “Sure! I’d be happy to help,” restated questions. Code, commands, file paths and the actual reasoning stay intact.

A nice side effect: terseness doesn’t seem to hurt correctness. The project even cites 2026 research arguing that constraining models to brief answers can improve accuracy on some tasks — plausible if you’ve ever watched a model talk itself out of the right answer. Treat that as the author’s framing rather than settled fact, but the practical experience holds up: short answers are easier to check.

How it works

caveman is a skill / prompt instruction, not a model or a wrapper. It injects a compact rule into the agent’s context that says, roughly: keep the technical substance exact, kill the fluff, fragments are fine, short synonyms, leave code unchanged. The agent keeps thinking normally — it just speaks with fewer words.

There are intensity levels, so you choose how much grunt you want:

  • Lite — drop filler, keep grammar. Professional, just not padded.
  • Full — the default. Drop articles, use fragments, full caveman.
  • Ultra — maximum compression, telegraphic, abbreviate everything.
  • 文言文 (wenyan) — classical Chinese mode, for when you want the most token-efficient written language humans ever invented.

Here’s the same answer across two levels (from the project’s examples):

Normal (69 tokens): “The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle… I’d recommend using useMemo to memoize the object.”

Caveman Full (19 tokens): “New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”

You trigger it with /caveman (or “talk like caveman”, “caveman mode”) and turn it off with “stop caveman” / “normal mode”. On Claude Code it auto-activates every session via a SessionStart hook, so you don’t have to remember.

A second, less obvious trick is caveman-compress. Your CLAUDE.md and other memory files load on every session — those are input tokens you pay repeatedly. caveman-compress rewrites the prose in those files into caveman-speak (leaving code, URLs, paths and headings untouched) and keeps a human-readable .original.md backup. The project measured ~46% average shrinkage on memory files. So caveman trims both what the agent says (output) and what it reads (input).

Getting started

This is the first-class target. With the plugin system:

claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman

That installs the skills plus the auto-loading hooks, so caveman is active from the start of each session. There’s also a statusline badge ([CAVEMAN], [CAVEMAN:ULTRA], …) so you can see at a glance which mode you’re in.

If you’d rather not use the plugin system, there’s a standalone hook installer:

# macOS / Linux / WSL
bash <(curl -s https://raw.githubusercontent.com/JuliusBrussee/caveman/main/hooks/install.sh)

# Windows (PowerShell)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/hooks/install.ps1 | iex

Once installed, it applies to every session for that install target.

On the Claude desktop app

Honest caveat: caveman officially documents Claude Code (the CLI) and a long list of other agents — Codex, Gemini CLI, Cursor, Windsurf, Cline, Copilot — but it does not ship a dedicated installer for the Claude desktop app. So there’s no claude plugin … command for Desktop.

Two practical ways to get the same behaviour there:

  1. Add it as a custom skill. caveman is distributed as standard SKILL.md files, and the Claude desktop app supports custom skills/capabilities. You can point it at the caveman skill so it behaves the same way as in the CLI.
  2. Paste the always-on snippet into your project instructions or preferences — this is the rule caveman injects, and it works in any agent that takes a system prompt:
Terse like caveman. Technical substance exact. Only fluff die.
Drop: articles, filler (just/really/basically), pleasantries, hedging.
Fragments OK. Short synonyms. Code unchanged.
Pattern: [thing] [action] [reason]. [next step].
ACTIVE EVERY RESPONSE. No revert after many turns. No filler drift.
Code/commits/PRs: normal. Off: "stop caveman" / "normal mode".

Or, for a one-off, just tell it “talk like caveman” — that triggers the same mode without any setup.

Token performance: before and after

The project publishes real token counts from the Claude API, across ten common coding tasks. The headline claim is “~75% fewer output tokens”; the measured benchmark average is 65%, with a wide range depending on how much fluff the task invites:

TaskNormalCavemanSaved
Implement React error boundary345445687%
Explain React re-render bug118015987%
Set up PostgreSQL connection pool234738084%
Fix auth middleware token expiry70412183%
Debug PostgreSQL race condition120023281%
Docker multi-stage build104229072%
Review PR for security issues67839841%
Architecture: microservices vs monolith44631030%
Refactor callback to async/await38730122%
Average121429465%

The pattern is clear: explanation-heavy tasks (where the model would otherwise narrate) win big — 80–87%. Tasks that are already mostly code or a single decision win least — 22–30%, because there wasn’t much fluff to cut in the first place.

One important caveat the project is upfront about: caveman only affects output tokens. It does not touch thinking/reasoning tokens — it makes the mouth smaller, not the brain. So if most of your spend is on extended reasoning, the savings will be smaller than the headline. The biggest, most reliable wins are speed and readability; cost is a bonus on output-heavy, agentic loops.

How it’s measured. The numbers aren’t a guess. The project’s benchmark script sends each task to the Anthropic API twice — once with a plain “you are a helpful assistant” system prompt, once with the caveman skill as the system prompt — at temperature=0, three trials per task, and takes the median of the output_tokens the API actually reports (not an estimate). The table above is measured on Claude Sonnet 4 (claude-sonnet-4-20250514); each run also records the model, the date, and a SHA-256 of the skill file, and you can reproduce it yourself with python benchmarks/run.py. One honest note about the method: the baseline is a verbose assistant, so this comparison partly captures generic terseness alongside the skill itself — the repo ships a separate three-arm eval that controls for exactly that.

Takeaway

caveman is a one-line install that trades the agent’s small talk for signal. On verbose, explanation-heavy work it cuts roughly two-thirds of output tokens — sometimes much more — without dropping technical accuracy, and caveman-compress extends the idea to the memory files you re-read every session. On Claude Code it’s a clean plugin install with auto-activation; on the desktop app you replicate it with a custom skill or the always-on snippet.

If you live in a coding agent all day, it’s a low-risk experiment: install it, run a session, and check your own token dashboard. Worst case, you type “normal mode” and you’re back to full sentences.