
In 2019, Dario Amodei, then OpenAI's research director, warned that the startup's new large language model was "too dangerous to release" due to its potential for generating misleading content. When GPT-2 was eventually released almost a year later, the concerns seemed somewhat overblown.
But fresh warnings by the soft-spoken Amodei, now chief of OpenAI rival Anthropic, about the risks posed by Mythos -- his lab's latest addition to the Claude family and a preview of its new general-purpose language model -- appear far more grounded.
This warning has countries, including India, extremely worried. On Thursday, Finance Minister Nirmala Sitharaman chaired a high-level meeting with the banking industry to prepare defensive measures against the system. The Indian Express has also learnt that the government is currently in conversation with Anthropic's senior leadership in the US on the issue.
What can Mythos do and why has this raised cocerns?
While this new model performs strongly across the board, its standout feature is this incredibly capable at executing computer security tasks: both fixing them (if set to work as a defender), as well as exploiting them (if deployed as a hacker). What's spooked policymakers around the world is Anthropic's claim that Mythos has already found severe vulnerabilities in "every major operating system and web browser", including one that had gone undetected for nearly three decades.
Also Read | What is Claude Mythos, and why is Anthropic limiting its rollout?
How fast Mythos' capabilities have emerged is, as the name suggests, almost mythical in terms of scale. Just last month, Anthropic had announced that its previous generation Opus 4.6 model "is currently far better at identifying and fixing vulnerabilities than at exploiting them".
Internal evaluations by the AI startup showed that Opus 4.6 generally had "a near-0% success rate" at autonomous exploit development, but Mythos Preview simply proved to be in a different league altogether.
Story continues below this ad
For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla's Firefox 147 JavaScript engine (a programme that executes JavaScript code that takes human-readable JavaScript and converts it into machine code that the computer can run) into JavaScript shell exploits (or a successful attack that gives the user control over the command line, or shell, of the victim's environment through the browser's engine) only two times out of several hundred attempts.
Anthropic re-ran the same experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control (taking control of the general behaviour of a CPU, or the brain of a computer system) on 29 more, according to Anthropic. Non-experts, the company said, can also leverage Mythos Preview to find and exploit sophisticated vulnerabilities.
Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up the following morning to a complete, working exploit.
"We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them," the startup said in its April 7 note. All of these features have effectively prompted Anthropic to hit pause.
Real world testing validates Anthropic's fears
Story continues below this ad
Earlier this month, the UK AI Security Institute (AISI), in its evaluation of Anthropic's Claude Mythos Preview, flagged a marked jump in cyber capability, especially in structured testing environments. One of its headline findings is that the model was able to solve 73% of expert-level cybersecurity challenges in benchmark settings, far higher than earlier frontier models. These tests, largely drawn from capture-the-flag (CTF) style tasks, were designed to approximate real-world vulnerabilities and require a mix of technical depth and problem-solving ability.
Where earlier models often showed patchy results at higher difficulty levels, Mythos demonstrated a stronger ability to sustain performance across complex challenges. This suggests improvements in reasoning and planning, enabling the model to navigate layered cyber problems rather than just isolated exploits.
Also Read | Anthropic investigating claims of unauthorised access to 'high-risk' Claude Mythos: Report
Crucially, the institute also highlighted Mythos's growing "agentic" behaviour, its ability to string together multiple steps into a coherent attack pathway. In testing, it was able to execute long, multi-stage attack chains, indicating a shift from tool-like assistance to more autonomous capability. This raises concerns about how such systems could lower the barrier for less-skilled actors to conduct sophisticated cyber operations.
Story continues below this ad
In one of the tests designed by the UK AISI, called 'The Last Ones' - a 32-step corporate network attack simulation spanning initial reconnaissance through to full network takeover, Mythos turned out to be the first model to solve the task from start to finish in 3 out of its 10 attempts. Across all its attempts, the model completed an average of 22 out of 32 steps. AISI said it would take humans 20 hours to complete the same tasks.
Anthropic holds back full launch of Mythos, but model gets leaked
Even as it held back commercial deployment citing all these concerns, Anthropic parallelly announced Project Glasswing, which aims to assist companies that hope to use Mythos to step up cyber-defences before it is widely released. Major software developers -- including Apple, Nvidia, the Linux Foundation and CrowdStrike, as well as competitor Google were in this list.
What has now set the cat among the pigeons is a report by Bloomberg that the Mythos model was accessed by "a handful of users" in a private Discord chat on the day it was announced publicly, despite the restricted release. US Federal Reserve Chair Jerome Powell and Treasury Secretary Scott Bessent met with top American bank CEOs in a closed-door meeting earlier this month to discuss the cybersecurity risks posed by Mythos.
Story continues below this ad
On April 23, at a high-level meeting to assess the risks that Mythos poses to India's financial sector, Union Finance Minister Sitharaman told banks to exercise a "high-degree" of vigilance and develop a coordination mechanism to respond to threats emerging from capabilities of this model.
Anthropic's April 7 note titled 'Assessing Claude Mythos Preview's cybersecurity capabilities' does sign off on a somewhat optimistic note. "Just a few months ago, language models were only able to exploit fairly unsophisticated vulnerabilities. Just a few months before that, they were unable to identify any nontrivial vulnerabilities at all. Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development."
In the long run, Anthropic's researchers said they expect that defense capabilities will dominate: that the world will emerge more secure, with software better hardened -- in large part by code written by these models. But the transitional period will be fraught. The leak of the software on the Discord chat and the question marks that it raises is a case in point.