inwo inwo.
← all tags

#llm

9 posts

Ponytail: The Best Code Is the Code Your Agent Never Writes

Jun 27, 2026 · Shingo Nakamura · AI

Ponytail is a one-file 'lazy senior dev' skill that makes coding agents stop and pick the simplest solution that works — cutting code by ~54% (and tokens ~22%) on a real agentic benchmark. What it is, how the ladder works, what trusted reviewers found, and the honest caveats.

claude-codellmtokensskills
read more →

Ralph: Put a Coding Agent in a While Loop and Walk Away

Jun 27, 2026 · Shingo Nakamura · AI

Ralph is a brutally simple technique — loop the same prompt into a coding agent until the task is done. What it is, how the two main implementations (Anthropic's ralph-wiggum plugin and snarktank/ralph) actually differ, the anecdotal numbers behind the hype, and an honest critical review of where it breaks.

claude-codellmagentsautonomous
read more →

LiteLLM: One API for Every LLM

Jun 21, 2026 · Shingo Nakamura · AI , Python

LiteLLM is an open-source AI gateway that gives you a single OpenAI-format interface to 100+ LLM providers — as a Python SDK or a self-hosted proxy. What it is, how it works, two real use cases, how it compares, its performance, and its honest pros and cons.

litellmgatewayllmproxy
read more →

LLM Wiki: Compile Your Knowledge Instead of Retrieving It

May 15, 2026 · Shingo Nakamura · AI

Andrej Karpathy's LLM Wiki pattern — let an LLM build and maintain a persistent, interlinked wiki from your sources, so knowledge compounds instead of being re-derived on every query. What it is, how it works, how it compares to RAG, and the token economics.

llm-wikiragknowledge-managementagents
read more →

Caveman: Why Use Many Token When Few Do Trick

May 1, 2026 · Shingo Nakamura · AI

A Claude Code skill that makes the agent talk like a caveman — cutting output tokens by up to ~75% while keeping full technical accuracy. Why, how it works, how to install it, and real before/after numbers.

claude-codellmtokensproductivity
read more →