CodeBurn: See Where Your AI Coding Tokens Actually Go

If you use an AI coding agent for real work, you’ve probably had the moment where the bill — or the API meter — is higher than you expected, and you have no idea which part of your workflow burned it. Was it the model? The retries? An agent re-reading the same files all day? Without instrumentation, it’s guesswork.

CodeBurn answers that question. It’s a local terminal dashboard that reads your agent’s session files straight off disk, prices every call, and breaks the spend down by project, model, activity, tool, and provider. This post covers what it is, how to read the dashboard (with a real screenshot), how to actually use it to cut token waste, how it compares to ccusage, and the honest pros and cons.

What it is

CodeBurn is a local, open-source (MIT) CLI and TUI that tracks token usage, cost, and performance across ~18 AI coding tools — Claude Code, Claude Desktop, Codex, Cursor, Gemini CLI, GitHub Copilot, OpenCode, Pi and more (the GitHub listing cites 25+ as it grows). It reads each tool’s session transcripts directly from disk and prices every call using LiteLLM’s model price data.

The key design choice: everything runs locally — no wrapper, no proxy, no API keys. CodeBurn doesn’t sit in front of your agent intercepting traffic; it reads the JSONL/SQLite session files your tools already write (e.g. ~/.claude/projects/ for Claude Code) and reconstructs the cost picture after the fact. Nothing leaves your machine.

Why it matters

The benefit is simple: you stop guessing where your tokens go. Concretely, monitoring with CodeBurn gives you:

A real dollar figure, not a token count you have to mentally convert. It prices input, output, cache-read, cache-write and web-search tokens per model, so $165.58 means $165.58.
Attribution. Cost split by project, by model, by activity (coding vs debugging vs exploration vs just talking), by tool, and by MCP server. You see which work is expensive.
A cache-efficiency signal. Cache hit rate is the single biggest lever on cost with Claude-style pricing; CodeBurn surfaces it front and center.
A quality signal, not just a cost one. Its one-shot success rate shows how often the agent got an edit right the first time vs burning tokens in edit/test/fix loops — the differentiator versus plain usage meters (NateCue).
Actionable fixes. The optimize command turns the data into ranked, copy-paste remedies and an A–F setup health grade.

How it works

CodeBurn is a reader, not a middleman. Each supported tool stores session transcripts locally — Claude Code as JSONL at ~/.claude/projects/<path>/<session>.jsonl, Codex under ~/.codex/sessions/, Cursor in a SQLite database, and so on. CodeBurn parses those files, deduplicates messages, filters by date range, prices each call against cached LiteLLM rates (with hardcoded fallbacks for Claude and GPT models so nothing is mispriced), and classifies every turn into one of 13 activity categories using deterministic rules — tool-usage patterns and message keywords, no LLM calls. The classification is why it can tell “debugging” from “feature dev” without asking a model.

Getting started

It’s an npm package (Node 20+), runnable without installing:

npx codeburn                 # interactive dashboard, default 7-day window
# or install it
npm install -g codeburn
# or Homebrew
brew tap getagentseal/codeburn && brew install codeburn

That’s it — no config, no keys. CodeBurn auto-detects which AI tools you use from the session data on disk. Arrow keys (or 1–5) switch between Today, 7 Days, 30 Days, Month, and All Time; p toggles provider; q quits.

Reading the dashboard

Here’s what the default dashboard looks like — this is a real 7-day window:

CodeBurn TUI dashboard showing a 7-day window: total cost $165.58 across 704 calls and 4 sessions with 97.9% cache hit, plus panels for daily activity, projects, activities, models, tools, shell commands, skills and MCP servers. — The CodeBurn dashboard. Period tabs along the top, a summary header, then breakdown panels. Everything is reconstructed from local session files.

Reading it panel by panel:

Summary header. The headline numbers for the selected period: total cost, calls, sessions, and cache hit %, then the token flows — in (fresh input you pay full price for), out (generated tokens), cached (input served from cache, much cheaper), and written (tokens written into the cache). A 97.9% cache hit means almost all input was cheap cache reads — that’s healthy.
Daily Activity. Cost and call count per day, as a bar chart. Spot the expensive days at a glance.
By Project. Cost, average cost per session, session count, and overhead per project — so you know which repo is eating the budget.
By Activity. Cost, turns, and one-shot rate per category (Coding, Exploration, Feature Dev, Debugging, Testing…). One-shot is the share of edit turns that landed without a retry; Coding 100% means every edit worked first try.
By Model. Cost, cache %, calls, and one-shot per model. This is where you catch an expensive model doing cheap work (e.g. Opus on trivial turns).
Core Tools / Shell Commands. How many times each tool (Edit, Read, Write, Bash…) and each shell command (cat, grep, git…) was called — a fingerprint of how the agent actually worked.
Skills & Agents / MCP Servers. Usage (and cost) of skills, sub-agents, and MCP servers — useful for spotting an MCP server paying schema overhead every session without being used.

The README ships a short “reading the dashboard” cheat sheet worth internalizing: cache hit under 80% suggests unstable context or caching disabled; lots of Read calls per session hints the agent is re-reading files it should remember; a low one-shot rate means retry loops; Opus dominating cost on small turns means you’re overpaying on model choice. These are starting points, not verdicts.

In practice

The dashboard tells you what; three commands help you act:

codeburn optimize     # scan sessions + ~/.claude for waste, get copy-paste fixes
codeburn compare      # side-by-side model performance/efficiency comparison
codeburn yield        # correlate sessions with git commits: productive vs reverted

optimize is the payoff for monitoring: it flags files re-read across sessions, low Read:Edit ratios, uncapped bash output, unused MCP servers, bloated CLAUDE.md files, and ghost skills/agents — each with an estimated token-and-dollar saving and a ready-to-paste fix, rolled up into an A–F grade. compare puts models head to head on one-shot rate, retries, cost per edit, and cache hit. yield is the most honest metric of all: it ties sessions to git history and labels spend as productive, reverted, or abandoned — so you can see money spent on work that never shipped. You can also export everything (codeburn export -f json) or pipe report --format json into jq for your own analysis.

A privacy note for the screenshot above: a CodeBurn dashboard shows real spend and real project paths. If you publish it, consider what you’re revealing.

How it compares

The obvious comparison is ccusage, the popular Claude-Code usage CLI that CodeBurn credits as inspiration (alongside CodexBar). The clean framing from the write-ups: if ccusage is a table of numbers, CodeBurn is a dashboard with an interface (NateCue, Medium).

Dimension	CodeBurn	ccusage
Interface	full TUI dashboard + macOS menu bar	lean text/table CLI
Activity classification	13 categories, deterministic	not the focus
One-shot / quality metrics	yes (per activity and model)	no
Optimize / compare / yield	built in	usage reporting focus
Multi-tool breadth	~18+ tools	many tools too
Best when	you want a visual overview and waste-hunting	you want quick numbers, leanest possible

Be fair to ccusage: it’s leaner and faster if all you want is “how many tokens did Claude Code use this month,” with billing-window tracking and a smaller footprint. Reach for CodeBurn when you use more than one AI tool, want the visual TUI, or care about why tokens were spent — not just how many (Hacker News discussion).

Tradeoffs

The honest cons:

It’s after-the-fact, not a budget guard. CodeBurn reports on spend that already happened; it won’t stop a runaway session in real time or enforce a cap.
Costs are reconstructed, not billed. Pricing comes from LiteLLM data, so figures are estimates — usually very close, but for tools that hide the model (Cursor “Auto”, Kiro) cost is estimated at Sonnet rates and clearly labeled as such.
Local-only by design. Great for privacy, but there’s no team rollup or shared dashboard out of the box — it sees one machine’s session files.
Estimation gaps for some providers. Tools without explicit token counts (some Copilot/Kiro paths) estimate tokens from content length, which is approximate.
You still have to interpret it. It surfaces signals; turning “cache hit 60%” into a fix is on you (though optimize narrows that gap considerably).

Takeaway

CodeBurn is the tool to reach for when “my AI coding is expensive” needs to become “this project, on this model, doing this activity, is expensive — here’s the fix.” It’s local, free, reads what your tools already write, and goes beyond counting tokens to scoring whether they were well spent. Use it when you want a visual, multi-tool overview and a path to cutting waste; use ccusage if you just want the fastest possible numbers for Claude Code alone. Either way, instrumenting your token spend is the cheap step that makes every other optimization possible.