
US-based AI developer Anthropic has unveiled a new language model called Claude Mythos Preview, which, according to the company, is capable of independently finding and exploiting security vulnerabilities in software. The model is said to surpass the capabilities of all but the best human security experts. Due to its threat potential, Anthropic does not plan a general public release.
As previously reported, developments around Mythos became known recently following a leak. Prior to this, Anthropic had already sent shares of cybersecurity companies into a tailspin with the release of Claude Code Security. The news about Mythos -- where companies such as Palo Alto Networks, CrowdStrike, CloudFlare, Cisco, and Broadcom are partners via "Project Glasswing" -- partially boosted their stocks on Tuesday.
Anthropic justifies the decision against a public release with the model's extraordinary capabilities. According to the company, Claude Mythos Preview can identify security vulnerabilities and develop exploits almost entirely autonomously, without human guidance. The concern: should such capabilities fall uncontrolled into the hands of actors who are not committed to responsible use, the consequences for the economy, public safety, and national security could be severe.
In the long term, Anthropic aims to make models of this performance class available safely and at scale. However, appropriate safeguards must first be developed that can detect and block dangerous outputs. These security mechanisms are to be tested initially with a less risky model -- an upcoming Claude Opus model.
As part of internal testing, Anthropic deployed Claude Mythos Preview to identify so-called zero-day vulnerabilities -- security flaws that were previously unknown to the respective developers. According to the company, thousands of critical vulnerabilities were discovered across all major operating systems and web browsers. Three specific examples were made public:
All of the vulnerabilities mentioned were reported to the respective software maintainers and have since been patched. For additional discovered flaws, Anthropic has initially published only a cryptographic hash of the details and intends to disclose the full information only after a fix has been applied.
To deploy the model's capabilities specifically for defensive purposes, Anthropic has launched the initiative Project Glasswing. The goal is to use Claude Mythos Preview in the context of defensive security work and to share the insights gained with the entire industry.
The founding partners include prominent companies from technology, finance, and cybersecurity:
In addition, more than 40 further organizations that develop or operate critical software infrastructure will be granted access to the model. They are intended to use it to audit and secure both their own and open-source systems for vulnerabilities.
Anthropic is making up to $100 million in usage credits for Claude Mythos Preview available for Project Glasswing. An additional $4 million will be awarded as direct grants to open-source security organizations.
"The work of defending the world's cyber infrastructure could take years; the capabilities of frontier AI will likely advance significantly over the coming months. For cyber defenders to maintain the upper hand, we must act now," reads a statement from the company.
Anthropic emphasizes that Project Glasswing is only a starting point. No single organization can solve cybersecurity problems alone. Frontier AI developers, software companies, security researchers, open-source developers, and governments worldwide are called upon to act together.