Claude-Mem: Persistent Memory for Your Coding Agent

Every Claude Code user eventually hits the same wall. You spend an hour teaching the agent your codebase — the weird auth flow, why utils/legacy.ts still exists, the table you renamed last week — and then the session ends or you /clear, and the next session starts from zero. The agent re-reads the same files, rediscovers the same gotchas, and lands on the same wrong assumptions you already corrected together.

Claude-Mem, by Alex Newman (@thedotmack), is built to kill that amnesia. It’s a “persistent memory compression system built for Claude Code” (per its README) that hooks into the session lifecycle, captures what the agent does as it works, compresses those captures into structured “observations” with AI, and injects the relevant ones back into future sessions — automatically, without you having to remember to remember.

This post covers what it is, how the pipeline actually works, how to install it, what independent reviewers report after weeks of real use, how it compares to the built-in options, and the honest tradeoffs — including a token-cost concern and a security audit you should know about before you install.

What it is

Claude-Mem is a Claude Code plugin (it also installs for Gemini CLI and OpenCode) that gives an agent cross-session memory. The README’s one-line summary: it “seamlessly preserves context across sessions by automatically capturing tool usage observations, generating semantic summaries, and making them available to future sessions.”

The key distinction, drawn out well in DataCamp’s hands-on guide, is that it runs as a plugin, not an MCP server. Plugins fire automatically on lifecycle events — session start, every tool call, session end — whereas an MCP memory server sits idle until the model decides to call it. With the MCP approach, retrieval only happens when Claude thinks to ask. Claude-Mem captures and injects without Claude having to choose to. It is Apache-2.0 licensed and free, and requires Node.js 18+ (Bun, the uv Python package manager, and SQLite auto-install on first run).

Why it matters

It removes the re-explaining tax. The biggest cost of long-lived projects isn’t the work, it’s re-establishing context every session. Claude-Mem fronts that for you.
Capture is continuous, not end-of-session. It records an observation after every tool call, so a mid-refactor crash doesn’t wipe everything since the last completed session — a point both the DataCamp guide and the andrew.ooo review emphasize as the real differentiator from summarize-at-the-end tools.
Retrieval is token-aware. Rather than dumping history into the prompt, it loads a cheap index first and fetches full detail only for what matters (more on this below).
It’s automatic. No manual tagging, no “save this to memory” calls. The hooks do the work.
It’s local-first. Data lives in a SQLite database under ~/.claude-mem/, and compression runs on your existing Claude Code auth — no separate API key.

How it works

Claude-Mem isn’t one big agent; it’s a tight pipeline of small parts. The README lists five lifecycle hooks, a Bun-managed HTTP worker on port 37777, a SQLite database, a mem-search skill, and a Chroma vector database for semantic search. The hooks map cleanly onto a session’s timeline.

The capture-and-inject pipeline. Hooks fire on lifecycle events; a background worker compresses raw tool output into structured observations in SQLite, and the next session's SessionStart hook injects the relevant slice back.

PostToolUse is the workhorse. After every tool call it sends the raw output to the worker via a non-blocking HTTP POST, and the worker uses the Claude Agent SDK to compress it into a structured observation. Per the DataCamp guide, each observation has a typed schema — a type (one of decision, bugfix, feature, refactor, discovery, change), a searchable title, a small array of facts (~50 tokens, cheap to load), a longer narrative (loaded only on demand), and semantic concepts tags. The Stop hook adds a session-level summary with fields like request, learned, completed, and next_steps.

The other half of the design is retrieval. The README describes a three-layer MCP search pattern that exists to avoid spending tokens on noise: start with a cheap index, then optionally get chronological context, and only then fetch full detail.

Progressive disclosure. The model narrows down using the cheap index and timeline layers, then fetches full observations only for the IDs it actually wants — the README cites roughly 10x token savings versus loading everything.

Search itself is hybrid: SQLite’s FTS5 full-text engine for keywords, plus the Chroma vector database for semantic similarity, with results merged and de-duplicated. There’s also a web viewer UI at http://localhost:37777 that streams observations live and exposes the settings.

Getting started

The fastest path is a single command, which checks prerequisites, registers the hooks, creates ~/.claude-mem/, and starts the worker:

npx claude-mem install

If you prefer Claude Code’s native plugin marketplace, the README gives a two-line equivalent:

/plugin marketplace add thedotmack/claude-mem
/plugin install claude-mem

For Gemini CLI or OpenCode, pass the IDE flag — npx claude-mem install --ide gemini-cli or --ide opencode. Then restart your agent. One trap both the README and every reviewer flag: do not run npm install -g claude-mem. That installs the SDK library only — it never registers the hooks or starts the worker, so nothing actually captures. To verify the install worked, the DataCamp guide suggests curl http://localhost:37777/api/health (expect {"status":"ok"}) and opening the web viewer to watch observations stream in.

In practice

Resuming a refactor. Say yesterday you switched session storage in src/auth/session.ts from JWT to opaque tokens. Today you open Claude Code in the same repo and, before you type anything, the SessionStart hook injects a handful of prior observations — the decision, the reason (you needed immediate revocation), the DB migration it required, and a gotcha that opaque tokens must be hashed before storage. You ask “finish the middleware migration we started,” and the agent already has the context without re-reading thirty files to rediscover it. That walkthrough is from the andrew.ooo review, which frames it well.

Searching your own history. Because the observations are queryable, you can ask the agent things you’ve personally forgotten — “where did I save that API key,” “how did we implement the auth flow,” “when did we decide to drop Redis for tokens” — and it searches memory directly via the mem-search skill. DataCamp’s author reports their install captured 6,814 observations across 259 sessions and ten codebases, sitting in a 39 MB SQLite file — a concrete sense of the scale this reaches over weeks of use.

How it compares

Claude Code already ships memory features, but as the DataCamp guide notes, none of them capture automatically: CLAUDE.md is static markdown you maintain by hand (and adherence drops past a couple hundred lines), Auto Memory lets Claude decide what to save but it’s unstructured and unsearchable, and /compact just summarizes the live conversation. Claude-Mem fills the gap of automatic, continuous, structured capture with token-aware retrieval. There’s also a small cluster of third-party alternatives.

Tool	Approach	Storage	Capture timing	Pricing
Claude built-in	Native (CLAUDE.md / Auto Memory)	Local markdown	Manual	Free
Claude-Mem	Plugin (lifecycle hooks)	Local SQLite + Chroma	Per tool call	Free
memsearch	Plugin (hooks + skill)	Local markdown + vector	Session end	Free
supermemory	Plugin (hooks + cloud)	Cloud	Session end	Paid
mem0 (self-hosted)	MCP server	Local vector store	Session end	Free

Be fair to the alternatives: a manual CLAUDE.md is dead simple and perfectly auditable for stable conventions; session-end tools like memsearch avoid a background worker; supermemory adds cross-machine and team sync that Claude-Mem doesn’t offer; and mem0 fits if you already run that infrastructure. Claude-Mem’s distinguishing bet is per-call capture plus structured compression, fully local and free. (Comparison synthesized from DataCamp and andrew.ooo.)

Performance

There are no formal, third-party benchmarks of Claude-Mem that I could find — so treat the figures below as the project’s own claims and reviewers’ anecdotes, not independent measurements. The README claims roughly 10x token savings on retrieval from filtering before fetching, and an experimental “Endless Mode” beta that it describes as a biomimetic memory architecture for long sessions. Reviewer-reported anecdotes include compressing an 885-line Python file from ~8,400 tokens to ~540 (a single user’s number, via andrew.ooo), and DataCamp’s author running compression on the cheap haiku model at “well under a dollar” a month of heavy use. None of these are controlled benchmarks; they’re directional. The honest summary is that the architecture is designed for token efficiency, but whether it nets out cheaper for you depends heavily on your plan and settings — which is exactly where the tradeoffs come in.

Tradeoffs

The enthusiasm is real, and so are the problems. Naming them is what makes the rest credible.

It can burn your token budget. This is the headline concern. Both external reviews point to GitHub issue #618, where users on tighter plans report Claude Code consuming their budget far faster after enabling it — every tool call triggers an AI compression, and SessionStart injects a slab of context. The DataCamp author nearly uninstalled it during the first week, when a new project produces a flood of fresh observations; it settled after the codebase was mapped. Mitigations exist (a conservative CLAUDE_MEM_MODE, lowering CLAUDE_MEM_CONTEXT_OBSERVATIONS, switching compression to a cheaper or free provider), but you should monitor usage for a week before trusting it on a metered plan.
A security audit rated it HIGH risk. Per DataCamp, a February 2026 community audit found the port-37777 HTTP API had no authentication — any local process could read every observation and setting — plus a default 0.0.0.0 host binding and a path-traversal issue in some tools. The reviewer’s recommendation: run it on a personal dev machine only, not a cloud VM or shared box. Verify the current state of these issues before you rely on it.
Install complexity and a background daemon. It auto-installs Bun, uv, and SQLite and runs a persistent HTTP server. On a managed or corporate machine that’s a real amount of “stuff,” and port conflicts mean editing settings.
Privacy is opt-in. Compression sends summaries to the Claude Agent SDK (i.e., the API), and capture is broad. There’s a <private> tag to exclude sensitive blocks, but you have to remember to use it.
Rough edges. Reviewers note a known ChromaDB subprocess leak (sticking to FTS5 search avoids it), occasional worker cold-start timeouts on Apple Silicon, and Windows support that effectively wants WSL2.

One note on accuracy: the README states the license is Apache-2.0, while the andrew.ooo review describes it as AGPL-3.0. I’ve gone with the README’s Apache-2.0; if licensing matters to you, check the repository’s LICENSE file directly before integrating.

Takeaway

Claude-Mem is the most complete open answer yet to Claude Code’s amnesia problem: automatic per-call capture, AI compression into a typed observation schema, hybrid local search, and token-aware progressive-disclosure retrieval — installed in one command. Reach for it if you run an agent against the same repo for weeks and you’re tired of re-explaining your project every session. Be more cautious if you’re on a plan with tight rolling limits (watch issue #618 and tune the settings down), and keep it to a personal machine given the open security findings. The one thing to remember: the architecture is genuinely well thought out for context engineering, but the cost and safety profile depend on your setup — install it deliberately, watch the web viewer for a week, and keep it only if the math works for you.