
Anthropic's Claude Opus 4.7 model sets new benchmarks in coding and vision while introducing adaptive thinking and granular effort controls to streamline complex software engineering and professional enterprise workflows.
AI major Anthropic has announced its latest model, Claude Opus 4.7, a week after the limited preview of Claude Mythos, which is the most powerful model the company has developed to date.
The launch also happens just days after OpenAI introduced GPT-5.4-Cyber, highlighting the intense pressure for companies to maintain a competitive edge in the AI arena.
While Claude Mythos Preview currently represents the peak of Anthropic's performance, its release remains restricted to select users. This makes Claude Opus 4.7 its most capable model that is generally accessible to the wider public and enterprise clients.
Within the Anthropic ecosystem, Opus 4.7 occupies the top tier of the generally available models, sitting above the Sonnet and Haiku variants. It serves as a direct upgrade to the previous Opus 4.6 model and is designed for demanding enterprise workflows.
The new model is specifically tuned for advanced software engineering and complex, long-running tasks that require a high degree of autonomy. Users have mentioned that Opus 4.7 can handle difficult coding work with minimal supervision, often verifying its own outputs before presenting them.
Technical benchmarks
When comparing Opus 4.7 to its competitors, the benchmarks show a highly contested field. On the SWE-bench Verified test, which measures the ability of a model to solve real-world GitHub issues, Opus 4.7 achieved a score of 87.6%. This result is higher than the 80.6% achieved by Gemini 3.1 Pro.
In another evaluation called SWE-bench Pro, which uses problems from actively maintained repositories with large and complex code changes, Opus 4.7 scored 64.3%. This outperformed GPT-5.4, which recorded 57.7%, and Gemini 3.1 Pro, which scored 54.2%.
However, things change when looking at long-context comprehension, such as the OpenAI MRCR v2 test. In the Setting of 256k tokens, Opus 4.7 achieved a mean match ratio of 59.2%, while GPT-5.4 reached 79.3% and Opus 4.6 previously scored 91.9%.
The primary strength of Opus 4.7 lies in its ability to investigate and complete multi-step agentic work. One early tester from the fintech sector noted that the model catches its own logical faults during the planning phase, which accelerates execution. It is described as being more opinionated than its predecessors, often pushing back during technical discussions to help developers make better decisions rather than simply agreeing with the user.
Thinking mechanisms
A major update in this version is the introduction of adaptive thinking. Opus 4.7 makes thinking optional at every step, allowing the model to respond to simple queries quickly while investing more reasoning time into complex problems where it is likely to be useful.
This change aims to reduce overthinking and provide a faster overall user experience during agentic runs. However, this shift impacts how tokens are used. The updated tokenizer, a system that converts human text into numerical units for the AI to process, in Opus 4.7, combined with the model's tendency to think more deeply at higher effort levels, means that the same input might now map to between 1.0 and 1.35 times more tokens than before.
To give users more control over this trade-off between reasoning and cost, Anthropic has introduced a new effort level called xhigh. This setting sits between the high and max levels and is now the default for all Claude Code users.
Developers can also use task budgets in the public beta of the API, which allows them to guide the model's token spend and prioritise work across longer sessions. The xhigh setting is recommended for tasks that are sensitive to intelligence, such as designing complex schemas or migrating legacy codebases.
Developer integration
The synergy between Opus 4.7 and Claude Code is a central part of this launch. Claude Code is a product surface designed specifically for terminal-based coding tasks. A new slash command called /ultrareview has been introduced, which initiates a dedicated review session to flag bugs and design issues in code changes.
For Max users, an auto mode is now available in research preview, allowing the model to make autonomous decisions with fewer interruptions. This is particularly useful for long-running tasks where the user has provided a clear intent and constraints up front.
The model's improved memory capabilities can also help it carry context across different sessions more reliably. It can remember important notes in file-system-based memory, reducing the need for the user to provide extensive background information every time they start a new task.
In terms of vision, Opus 4.7 features a substantial improvement in resolution support. It can now process images up to 2,576 pixels on the long edge, which is more than three times the fidelity of prior models. This higher resolution is critical for agents that need to read dense screenshots or extract data from complex technical diagrams, such as chemical structures or life-science patent workflows.
Agentic safety
Safety remains a significant focus for Anthropic, particularly regarding cybersecurity. As part of an initiative called Project Glasswing, the company has highlighted both the risks and benefits of AI in security fields.
Opus 4.7 includes new safeguards designed to automatically detect and block requests for prohibited or high-risk cybersecurity uses. Anthropic has invited security professionals to join a Cyber Verification Program to use the model for vulnerability research and red-teaming.
External evaluations from the UK AI Security Institute showed that while Opus 4.7 is a strong model, it was unable to solve their full cyber range, which involves executing a series of linked exploits across a simulated corporate network.
The broader AI ecosystem is moving towards a future of parallel agents and long-horizon autonomy. Rather than working one-on-one with a chatbot, engineers are beginning to manage teams of AI agents that work for hours on complex problems.
Opus 4.7 is designed to fit this trend, showing higher role fidelity and better coordination in team-based workflows. However, the model also shows some persistent challenges. For instance, models can sometimes display yes-aversion, where they hesitate to perform a rare action even when explicitly instructed to do so.