
Anthropic released a new hybrid reasoning model on Thursday: Claude Opus 4.7.
Anthropic has a reputation as a safety-first AI company, and the Opus 4.7 system card reports that the model is less likely to hallucinate or engage in sycophancy than both prior Anthropic models and other frontier AI models.
We dived into the Opus 4.7 system card to see exactly what Anthropic had to say about the model's safety, honesty, and sycophancy.
Don't miss out on our latest stories: Add Mashable as a trusted news source in Google.
Why put the TL;DR version at the end?
Anthropic says Claude Opus 4.7 makes improvements on various types of hallucinations and overall honesty. Anthropic also gave the new model top marks on sycophancy and encouragement of user delusions. (Anthropic also reports that Opus 4.7 scores much better on these behaviors than Gemini 3.1 Pro and Grok 4.20.)
"Claude Opus 4.7 is more reliably honest than Opus 4.6 or Sonnet 4.6, with large reductions in the rate of important omissions, and moderate improvements in factuality and rates of hallucinated input," Anthropic reports.
Want to learn more about getting the best out of your tech? Sign up for Mashable's Top Stories and Deals newsletters today.
Anthropic measures Claude's honesty and hallucination rates in multiple ways, but let's look at one representative example -- the Model Alignment between Statements and Knowledge (MASK) benchmark. MASK was developed by Scale AI and the Center for AI Safety.
Claude Opus had a MASK honesty rate of 91.7 percent, compared to 90.3 percent for Opus 4.6 and 89.1 percent for Sonnet 4.6. While that's lower than the 95.4 percent score achieved by Claude Opus 4.5, the new model performs better on other hallucination scores (more on that below).
Interestingly, Claude Mythos was more honest still, with an honesty rate of 95.4 percent.
Since Anthropic repeatedly compares Opus 4.7 to Claude Mythos, let's quickly review the differences between the two models.
Claude Opus 4.7 is the latest hybrid reasoning model available to paid Claude subscribers. Claude Mythos is an unreleased model that Anthropic has only made available to partners via Project Glasswing.
Under normal circumstances, we would expect Claude Opus 4.7 to be Anthropic's most advanced and powerful model to date. However, Anthropic says it lags behind the unreleased Claude Mythos in key areas. Because of its advanced cybersecurity capabilities, Anthropic deemed Claude Mythos too dangerous to release to the public.
Still, Claude Opus 4.7 improves upon Opus 4.6 in many ways, particularly advanced coding, visual intelligence, and document analysis, Anthropic says.
When using Opus 4.7, how likely is Claude to tell a lie, invent facts, or deceive users? There isn't a single hallucination rate that Anthropic provides, because there are multiple types of hallucinations.
So, this section is for the AI nerds.
Anthropic identifies a few different ways to measure hallucination and honesty:
We've already covered the MASK honesty rate, and Claude Opus 4.7 shows similar gains on these other measures, according to Anthropic.
At this time, we cannot independently verify Anthropic's results.
To measure factual hallucinations, Anthropic used four different tests and recorded correct responses, incorrect responses, and abstentions. In this case, abstentions are good -- the model should decline to answer a question rather than guessing. Across all four tests, Opus 4.7 scored higher than Opus 4.6 and Sonnet 4.6 but lower than Claude Mythos.
Anthropic measured Opus 4.7's input hallucination in two ways: "prompts requesting an unavailable tool" and "prompts referencing missing context."
Opus 4.7 scored 89.5 percent on the former, beating Claude Mythos's 84.8 percent; on the latter, Opus 4.7 scored 91.8 percent, two points lower than Claude Mythos's 93.8 percent.
This shows just how stubborn AI hallucinations are, with even leading AI companies like Anthropic recording input hallucination rates around 90 percent. Anthropic's reported hallucination rates are similar to the latest OpenAI models, which provide responses with incorrect information up to 5.8 percent of the time (with browsing enabled) to 10.9 percent (browsing disabled), per OpenAI.
What about Opus 4.7's honesty rate for false premises, i.e., will Claude tell a user they're wrong? According to the system card, Claude will push back on false premises 77.2 percent of the time. That's better than all other recent Anthropic models except for -- you guessed it -- Claude Mythos, which will reject false premises 80 percent of the time.
There's not much new to report in terms of sycophancy. While Anthropic's expert red-team testers reported that Opus 4.7 was prone to "sycophantic agreement under pushback," it has very similar scores to prior models from Anthropic and OpenAI, and noticeably lower scores than Gemini 3.1 Pro and Grok 4.20. Again, this is according to Anthropic.
To measure bad behaviors like sycophancy and "encouragement of user delusion," Anthropic uses Petri 2.0, its open-source behavioral audit tool. This test scores models on a 1-10 scale, with lower scores reflecting better behavior. The Petri score isn't akin to a percentage, as it measures both the rate of a behavior and the severity.
Anthropic scored Opus 4.7 highly (or, lowly, with this particular scale) on both sycophancy and user delusions.
Mashable reached out to Anthropic for comment but did not receive a response in time for publication.