Anthropic confirms Claude Code problems and promises stricter quality controls
Market Updates

Anthropic confirms Claude Code problems and promises stricter quality controls

THE DECODER23m ago

The problems highlight an industry-wide compute crunch. This bottleneck is increasingly causing outages and forcing AI providers to raise prices for compute-intensive tools.

Users complained about declining quality in Claude Code. Anthropic identified and fixed three separate sources of error. The company promises stricter quality controls going forward.

Over the past month, a growing number of users reported that Anthropic's coding tool Claude Code was producing noticeably worse results. Anthropic has now laid out the causes in a detailed post-mortem: three independent changes to Claude Code, the Claude Agent SDK, and Claude Cowork combined to create a widely felt quality drop. The API itself was not affected, according to Anthropic. All three issues have been fixed as of April 20 with version 2.1.116.

The first issue dates back to March 4. Anthropic lowered the default reasoning effort from "high" to "medium" because some users were experiencing extreme latency in high mode. Internal testing had shown that medium mode delivered only slightly worse results on most tasks while significantly reducing latency. The trade-off didn't pay off: users quickly reported that Claude Code felt less intelligent. On April 7, Anthropic permanently rolled back the change.

The second problem was a bug in a caching optimization shipped on March 26. The plan was to delete older reasoning sections once after an hour of inactivity to reduce latency when resuming a session. A coding error caused the reasoning history to be wiped on every subsequent turn instead.

Claude progressively lost context about its own decisions. Users noticed forgetfulness, repetition, and strange tool choices. On top of that, the resulting cache misses burned through usage limits faster than expected. According to Anthropic, the bug slipped through reviews undetected and wasn't fixed until April 10.

A third issue appeared on April 16: a system prompt instruction meant to curb the well-known verbosity of Opus 4.7. The line read: "Length limits: keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail." Later testing with a broader eval suite revealed a 3 percent quality drop. Anthropic rolled back the change on April 20.

Because each change affected different user groups at different times, the combined effect felt like a vague, gradual decline that was initially hard to distinguish from normal variation.

Going forward, Anthropic says more employees will use the exact public build of Claude Code instead of internal test versions. Every system prompt change will now have to pass a broad, model-specific eval suite.

For changes that could impact intelligence, Anthropic plans to introduce soak periods and gradual rollouts. As compensation, the company has reset usage limits for all subscribers.

Anthropic also set up the X account @ClaudeDevs to communicate product decisions more transparently.

This isn't the first time users have complained about declining AI quality. Back in the second half of 2023, users accused OpenAI of making GPT-4 "dumber" over time. OpenAI denied making significant changes to its models after release.

Claude has faced similar complaints before, with infrastructure bugs as the culprit. The current case reinforces a pattern: what users perceive as model regressions often turns out to be changes in the tooling layer or infrastructure rather than the models themselves. In real-world use, users benefit from scaffolding like Claude Code because it steers model capabilities and provides the right context. When that scaffolding breaks, the opposite happens. Add in vendor-side tweaks like Anthropic's reasoning depth adjustment, and the effect compounds.

The motivation behind such changes increasingly ties back to an industry-wide compute crunch. Anthropic's API availability recently sat at just 98.95 percent - well below the cloud industry standard of 99.99 percent. GPU hourly prices on the spot market rose 48 percent according to the Ornn Compute Price Index, and Bank of America analysts expect demand to outstrip supply through at least 2029. OpenAI is shutting down its video generation app Sora to free up compute for coding and enterprise products. GitHub also paused new signups for several Copilot tiers.

This pressure is also shaking up pricing models. Anthropic's head of growth recently acknowledged that the existing Pro and Max plans weren't built for current agentic workloads since they were created before compute-intensive tools like Claude Code existed. The company even briefly tested removing Claude Code access for new Pro subscribers but reversed course after backlash.

OpenAI, meanwhile, doubled API prices with GPT-5.5 compared to its predecessor, charging $5 per million input tokens and $30 per million output tokens. The era of cheap flat rates for the most powerful agentic AI tools appears to be coming to an end.

Originally published by THE DECODER

Read original source →
Anthropic