#llm — inwo

Headroom: A Compression Layer That Shrinks Everything Your Agent Reads

Jun 28, 2026 · Shingo Nakamura · AI

Headroom sits between your AI agent and the model, compressing tool outputs, logs, RAG chunks, files and history before they cost you tokens — reversibly, and locally. What it is, how the router-plus-compressors design works, what its self-reported benchmarks actually show, and the honest costs.

claude-codellmtokenscontext

read more →

Ponytail: The Best Code Is the Code Your Agent Never Writes

Jun 27, 2026 · Shingo Nakamura · AI

Ponytail is a one-file 'lazy senior dev' skill that makes coding agents stop and pick the simplest solution that works — cutting code by ~54% (and tokens ~22%) on a real agentic benchmark. What it is, how the ladder works, what trusted reviewers found, and the honest caveats.

claude-codellmtokensskills

read more →

Ralph: Put a Coding Agent in a While Loop and Walk Away

Jun 27, 2026 · Shingo Nakamura · AI

Ralph is a brutally simple technique — loop the same prompt into a coding agent until the task is done. What it is, how the two main implementations (Anthropic's ralph-wiggum plugin and snarktank/ralph) actually differ, the anecdotal numbers behind the hype, and an honest critical review of where it breaks.

claude-codellmagentsautonomous

read more →

LiteLLM: One API for Every LLM

Jun 21, 2026 · Shingo Nakamura · AI , Python

LiteLLM is an open-source AI gateway that gives you a single OpenAI-format interface to 100+ LLM providers — as a Python SDK or a self-hosted proxy. What it is, how it works, two real use cases, how it compares, its performance, and its honest pros and cons.

litellmgatewayllmproxy

read more →

pi: A Minimal, Hackable Coding Agent for the Terminal

Jun 21, 2026 · Shingo Nakamura · AI

pi is a tiny, aggressively-extensible terminal coding harness. What it is, how to install and use it, how it compares to opencode and Claude Code, and what the benchmarks actually say.

coding-agentclillmagents

read more →

LangChain: What It Is, How It Compares, and When to Reach for It

Jun 20, 2026 · Shingo Nakamura · AI , Python

A practical look at LangChain — what it's for, how to install and use it, what you can build, whether you need to deploy it, how it stacks up against Google ADK and other agent frameworks, and its real downsides.

langchainagentsllmrag

read more →

What Is a Harness? The Scaffolding That Turns a Model Into an Agent

Jun 2, 2026 · Shingo Nakamura · AI

A plain-English explainer of the AI harness — the code around a model that lets it use tools, take steps, and get work done. What it is, why it matters, how it works, and the two senses of the word (agent harness vs evaluation harness).

harnessagentsllmtool-use

read more →

LLM Wiki: Compile Your Knowledge Instead of Retrieving It

May 15, 2026 · Shingo Nakamura · AI

Andrej Karpathy's LLM Wiki pattern — let an LLM build and maintain a persistent, interlinked wiki from your sources, so knowledge compounds instead of being re-derived on every query. What it is, how it works, how it compares to RAG, and the token economics.

llm-wikiragknowledge-managementagents

read more →

Caveman: Why Use Many Token When Few Do Trick

May 1, 2026 · Shingo Nakamura · AI

A Claude Code skill that makes the agent talk like a caveman — cutting output tokens by up to ~75% while keeping full technical accuracy. Why, how it works, how to install it, and real before/after numbers.

claude-codellmtokensproductivity

read more →