What Prompt Caching Does (and Why It Matters)
Every time you send a message in Claude Code, the system sends the full conversation context to the API:
- The system prompt (~2,000 tokens)
- All previous messages in the session
- File contents that have been read
- Tool call results
By interaction 20 in a session, this context can be 50,000+ tokens. Without caching, Claude processes all 50,000 tokens from scratch every single time — even though 95% of it is identical to the previous interaction.
Prompt caching solves this by recognizing repeated context and processing it at ~10% of the normal cost. The savings compound as sessions get longer.
How the Bug Inflated Costs
With caching broken, the cost curve looked like this:
| Interaction # | Context Size | Cost Without Caching | Cost With Caching | |--------------|-------------|---------------------|-------------------| | 1 | 5,000 tokens | 1x | 1x | | 10 | 25,000 tokens | 5x | 1.5x | | 20 | 50,000 tokens | 10x | 2x | | 30 | 75,000 tokens | 15x | 2.5x |
By interaction 30, users were paying 15x what they should have been. A session that should have consumed 10% of a Pro daily budget was consuming 150% — draining the entire allocation and hitting the limit.
Step-by-Step: Fixing the Bug
Step 1: Check Your Version
claude --version
If the output shows a version below 2.1.34, you are affected.
Step 2: Update Claude Code
For npm installations (most common):
npm update -g @anthropic-ai/claude-code
For Homebrew:
brew upgrade claude-code
Step 3: Verify the Update
claude --version
Confirm the version is 2.1.34 or later.
Step 4: Test the Fix
Start a new Claude Code session and work normally for 30 minutes. If your usage budget lasts significantly longer than before the update, the fix is working.
Why This Happened: Technical Root Cause
The prompt caching system uses a hash of the conversation context to determine if content has been seen before. Two bugs broke this:
Bug 1: Incorrect cache key computation
The cache key included a timestamp or session identifier that changed with every interaction, making every request appear unique even when the context was identical. The cache never got a "hit" because no two requests had the same key.
Bug 2: Premature cache invalidation
Even when Bug 1 was partially worked around, a second bug caused cached entries to expire after 2–3 interactions instead of persisting for the full session. This meant caching only worked for the first few messages before reverting to full-cost processing.
Both bugs needed to be fixed simultaneously — fixing only one still left caching broken.
Timeline of the Bug
- Early 2026: Users begin reporting unusually fast limit drain on Reddit and GitHub Issues
- March 2026: Multiple high-profile threads (330+ comments) document the issue with specific numbers ("20x max usage gone in 19 minutes")
- March 26, 2026: Forbes reports on the pricing issues, bringing wider attention
- March 31, 2026: Anthropic confirms the bug publicly
- April 2026: Claude Code v2.1.34 released with the fix
- No retroactive credits: Anthropic did not automatically refund affected users, though individual support requests were handled case-by-case
Common Mistakes to Avoid
- Not updating Claude Code: This is the single most impactful fix — everything else is secondary if you are on an old version
- Assuming the bug is fixed without checking your version: Auto-update is not enabled by default in all installation methods
- Blaming the model or plan for fast drain: If your limit drains in under 20 minutes, the cause is almost certainly the caching bug, not your plan tier
- Requesting a plan upgrade instead of updating: Upgrading from Pro to Max 5x does not fix the bug — it just gives you 5x the budget to drain 5x as fast
After the Fix: What Normal Usage Looks Like
With caching working correctly on Claude Code v2.1.34+:
- Claude Pro: Budget lasts 2–4 hours of active coding per 8-hour window
- Claude Max 5x: Budget lasts a full workday (6–8 hours) of moderate use
- Claude Max 20x: Effectively unlimited for most individual developers
If your usage still feels too fast after updating, check your model setting — Opus 4 costs ~3x more than Sonnet 4 per interaction.
Related Guides
- Claude Code usage limit draining fast — broader guide covering all causes of fast drain
- Claude Pro vs Max usage limits — which plan is right for Claude Code users
- Claude usage limit reached — what to do when you hit the cap
- Claude tools hub — all Claude guides and how-tos