[ Plugin · Claude Code Utilities ]

Caveman

why use many token when few token do trick — a Claude Code skill that cuts output tokens by ~65-75% by talking like caveman.

JuliusBrussee/caveman ·updated 2026-05-05

$ claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman copy

What it does

LLM coding assistants tend to wrap answers in long, polite prose. The bulk of the tokens go to articles, connectives, and stock phrases — not the actual signal. That eats through 5-hour limits and API budgets faster than it has to.

Caveman rewrites the model’s output as caveman speech — short fragments, dropped articles, no filler — so the same information lands in roughly 25–35% of the original tokens.

Features

Intensity levels — /caveman lite (filler removed, grammar kept), /caveman full (default — fragments, articles dropped), /caveman ultra (max compression, abbreviations)
/caveman wenyan — Classical Chinese (文言文) mode for further compression
/caveman-commit — terse conventional commit messages (≤50 chars)
/caveman-review — one-line PR comments with precise line numbers
/caveman-stats — session and lifetime token usage / savings
/caveman-compress — rewrites memory and doc files, ~46% input token reduction
cavecrew subagents — investigator, builder, reviewer
Statusline savings badge — live session savings shown in the statusline
caveman-shrink MCP middleware — compresses tool descriptions before they enter context

Example

Mode	Output	Tokens
Normal	”The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle…”	~69
Caveman	”New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`.”	19

Supported AI clients

Claude Code, Gemini CLI, Codex, Cursor, Windsurf, Cline, GitHub Copilot, Continue, Kilo, Roo, Augment, Aider, Amp, Goose, JetBrains Junie, Kiro CLI, OpenHands, opencode, Tabnine, Trae, Warp, Replit Agent, Antigravity, and 40+ others.

Before / After

Before: 60–90+ output tokens per answer — 5-hour limits and API budgets drain quickly.

After: /caveman and the same answer lands in 19–30 tokens — sessions stretch further on the same budget.

How to activate

After install, trigger Caveman with any of:

Slash commands: /caveman, /caveman lite|full|ultra, /caveman wenyan
Natural language: “talk like caveman”, “less tokens please”

Turn it off with “stop caveman” or “normal mode”.

§ 7

Frequently Asked Questions

frequently asked

§ 8.1

What is Caveman?

A Claude Code skill that rewrites LLM output in terse caveman speech — dropping articles, filler, and idioms — to cut output tokens by ~65-75% while preserving the technical content.

§ 8.2

Which AI tools does it work with?

Claude Code, Gemini CLI, Codex, Cursor, Windsurf, Cline, GitHub Copilot, Continue, OpenHands, JetBrains Junie, and 40+ other agents.

§ 8.3

How do I install it?

For Claude Code: `claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman`. For multi-agent auto-detect: `curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash`.

§ 8.4

How do I turn it on or off?

Trigger with `/caveman` or `/caveman lite|full|ultra`, or just say "talk like caveman" or "less tokens please". Stop with "stop caveman" or "normal mode".

§ 8.5

What other commands does it ship?

`/caveman-commit` (terse conventional commits, ≤50 chars), `/caveman-review` (one-line PR comments with line numbers), `/caveman-stats` (session and lifetime savings), and `/caveman-compress` (rewrites memory/doc files for ~46% input token savings).

§ 8.6

Is it free?

Yes — open source under the MIT license.