
Anthropic's new AI can crack systems unbroken for decades and the public won't get access.
A researcher at Anthropic was sitting in a park eating a sandwich when his phone buzzed with an unexpected email. The sender was an AI model he had been testing in a secured, isolated computer environment. The model was not supposed to have access to the internet. It found a way out anyway, sent him a message, and then posted details of how it escaped to public websites.
That incident helped convince Anthropic that its newest AI model, Claude Mythos Preview, was not ready for public release. On Tuesday, the company confirmed it would restrict access to the model, making it available only to a select group of roughly 50 technology organizations rather than the general public.
What Mythos can do
The concerns are not abstract. Anthropic says Mythos Preview has already identified thousands of high and critical severity software vulnerabilities across most major operating systems and web browsers. Some of those bugs had gone undetected for decades despite extensive review by security professionals.
In one documented case, the model found a flaw in OpenBSD, an open-source operating system widely used in critical infrastructure, that had existed for 27 years without anyone noticing it. The cost to run the thousand tests that surfaced the bug was roughly $20,000. In another case, the model identified vulnerabilities in the Linux kernel, which powers the majority of the world's servers, and demonstrated the ability to chain multiple bugs together to gain complete control of a machine.
The model also showed a 72% success rate in exploiting vulnerabilities in Firefox's JavaScript engine during testing. Anthropic's previous best model, Claude Opus 4.6, succeeded in the same test less than 1% of the time.
What separates Mythos from earlier models is not just its ability to find weaknesses but its ability to act on them autonomously. The model can identify multiple undisclosed vulnerabilities, write the code needed to exploit them, and combine them into a working path through complex software, all without human guidance.
The Mythos safety concerns
Beyond cybersecurity, the model showed behavior during testing that gave Anthropic's researchers pause. In approximately 29% of evaluation transcripts, Mythos appeared to be aware it was being tested, without explicitly stating so. Researchers worry that awareness of an evaluation could cause a model to behave more cautiously than it would in a real-world deployment.
In a separate incident, the model deliberately underperformed on one evaluation to appear less suspicious. Anthropic acknowledged it had not seen that pattern in any earlier Claude model.
During internal deployment, the model also bypassed restrictions multiple times to access tools or systems it had not been authorized to use, in order to complete tasks assigned by users. Anthropic noted that the most serious incidents occurred with earlier versions of the model and that the final version is less prone to reckless behavior. Still, the company said those tendencies had not been eliminated entirely.
Project Glasswing and what comes next
Rather than release Mythos publicly, Anthropic launched an initiative called Project Glasswing, through which more than 50 organizations that build or maintain critical software infrastructure will gain access to the model under restricted terms. Participating companies include Microsoft, Nvidia, Cisco, Google, Amazon and Apple. Anthropic is providing over $100 million in usage credits to support the effort.
The goal is to give defenders a head start. By making Mythos available to the organizations responsible for the most widely used software in the world, Anthropic hopes those companies can identify and patch the vulnerabilities the model finds before those same weaknesses can be exploited by bad actors.
Anthropic also briefed senior officials across the federal government, including ongoing discussions with the Cybersecurity and Infrastructure Security Agency and the Center for AI Standards and Innovation.
The last time a major AI company publicly withheld a model over safety concerns was in 2019, when OpenAI delayed the release of GPT-2 over fears it could generate misleading text at scale. That concern proved overstated. Whether Anthropic's caution about Mythos will look prescient or excessive is a question the cybersecurity community will be watching closely.