Claude API pricing: real costs from 6 months of use
Claude API pricing looks scary on the docs page and turns out fine in practice. After six months of running Claude on real projects, here are the actual numbers. What each model costs per token, what I pay per month for indie-scale traffic, and the routing pattern that cut my bill by ~60%.
The short version
- About $15 to $30 per month for an indie project at modest scale.
- Sonnet 4.6 does 90% of the work at one-fifth the cost of Opus.
- A 10-line model router saves 40% to 70% on real workloads.
- Prompt caching is the unglamorous superpower. 90% off cached input.
The actual price list
Per million tokens, as of mid-2026:
INPUT OUTPUT Haiku 4.5 $0.80 $4 Sonnet 4.6 $3 $15 Opus 4.7 $15 $75
Two things to internalize before you read further. Output is always 5x more expensive than input, and Opus is 5x more expensive than Sonnet across the board. Most of the cost-saving game is just “use Sonnet by default and limit how much the model writes back.”
What I actually pay
For a project doing daily generation at about 1k subscribers (the kind of indie-AI-brief / personal-tool workload):
- Input: about 50k tokens per day across all calls (curated source material + system prompts + history).
- Output: about 3k tokens per day (the actual generated brief plus helper outputs).
- Mix: about 80% Sonnet, 15% Haiku for filtering, 5% Opus for the hard prompts.
- Monthly bill: $8 to $12.
For a chat-style product with bursty user traffic (about 500 daily active users, 5 messages each):
- About 2.5k conversations per day, 3 turns each, totalling 7,500 model calls per day.
- Average 1,500 tokens in / 400 tokens out per call.
- Pure Sonnet: about $80 per month.
- With routing (Haiku for the obvious turns): about $30 per month.
Anyone telling you indie AI apps need $500 per month of API spend is either using a worse model than they need or hasn’t set up routing.
The pattern that saves 40% to 70%: model routing
Instead of sending everything to Opus “just to be safe,” classify the request and pick a model. The router is small enough to fit in a tweet:
function pickModel(task) {
if (task.kind === 'classify' || task.kind === 'extract') return 'haiku-4-5';
if (task.needsReasoning || task.userFacing) return 'opus-4-7';
return 'sonnet-4-6'; // the default
}
That’s the entire optimization. A task that goes to Haiku at $0.80/M instead of Opus at $15/M is 18x cheaper. Even routing 30% of your calls to Haiku saves a meaningful chunk of the bill.
Where does Opus actually pay for itself? Three cases I’ve found:
- User-visible reasoning where the answer’s quality is the product (writing, code review, hard analysis).
- Few high-stakes calls where rerunning the prompt is worse than paying 5x for one good answer.
- Long-context tasks over 100k tokens where Sonnet visibly degrades and Opus holds up.
For literally everything else: Sonnet 4.6.
Prompt caching: the 90%-off trick
If your system prompt or context is the same across many calls, mark it as cacheable. Anthropic charges 10% of normal input rates for cache hits. The first call writes the cache (priced at 1.25x to compensate); every call within the next 5 minutes reads it for cents.
Real numbers: a 5,000-token system prompt called 100 times in 5 minutes:
- Without caching: 5k × 100 × $3/M = $1.50
- With caching: 5k × 1.25 × $3/M (write) + 5k × 99 × $0.30/M (reads) = $0.17
- 9x cheaper for one line of config.
Use caching for any chunk of the prompt that doesn’t change between calls. System prompts, few-shot examples, tool definitions, retrieved context that’s reused.
The hidden costs that wreck budgets
- System prompts charged on every call. A bloated 8k-token system prompt equals 8k input tokens billed every single time. Tighten or cache.
- Conversation history that grows forever. By message 20, your input is 80% history. Truncate, summarize, or move to RAG.
- Verbose outputs. Set
max_tokensexplicitly. The model will fill the budget you give it. - Failed responses you retry. Every failed call costs the same as a successful one. Validate inputs first.
- Tool-use loops that don’t terminate. Set a max iteration count. Otherwise an “agent” runs until it bankrupts you.
What I’d do on day one of a new project
- Default to Sonnet 4.6 in code, not Opus.
- Add the model router from day one (even with just two branches).
- Cache the system prompt.
- Set explicit
max_tokenson every call. - Log token usage per call to your DB. You can’t optimize what you don’t measure.
- Re-check costs weekly for the first month, then monthly.
To understand why “tighter prompt” is the highest-leverage cost optimization, and why context windows are the trap nobody warns you about, read tokens, context, and why your prompt costs more than you think.
If you’re still picking your AI coding setup, the companion post on Cursor vs Claude Code covers the dev-side tooling that runs on top of these APIs.
One Comment