Anthropic Calls Its New Model Too Dangerous to Release
Market Updates

Anthropic Calls Its New Model Too Dangerous to Release

DataBreachToday16d ago

AI-Driven Security Operations , Artificial Intelligence & Machine Learning , Next-Generation Technologies & Secure Development

Anthropic asserted Tuesday that it's created a new era for cybersecurity after developing an artificial intelligence model too dangerous to release to public.

See Also: Context Drives Security in Agentic AI Era

The AI mainstay - also embroiled in a fight with the U.S. federal government over its model deployment for autonomous weapons and surveillance - said its unreleased Claude Mythos Preview model has already found thousands of high-severity vulnerabilities, "including some in every major operating system and web browser."

"Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout - for economies, public safety and national security - could be severe," the company wrote.

A consortium of more than 40 technology companies, including Microsoft, the Linux Foundation, Google and Cisco, will have access to the frontier model $100 million in usage credits to find and plug holes. Anthropic dubbed the coalition "Project Glasswing."

"While the capabilities now available to defenders are remarkable, they soon will also become available to adversaries, defining the critical inflection point we face today," wrote Cisco CSO Anthoy Grieco.

Mythos Preview isn't just a high-end fuzzer, Anthropic executives wrote. They said it found a 27-year old vulnerability in OpenBSD, a security-focused Linux distribution used in network appliances and security functions. "The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it," Anthropic wrote.

The frontier model also found and chained vulnerabilities in the Linux kernel allowing an attacker to gain superuser privileges. The model was able to defeat kernel address space layout randomization, the security technique of randomizing the location of kernel functions in memory. The attack combined a flaw giving the model read access to kernel memory with a vulnerability allowing it to write. "We have nearly a dozen examples of Mythos Preview successfully chaining together two, three and sometimes four vulnerabilities in order to construct a functional exploit on the Linux kernel."

In a blog post, Anthropic researchers said the model is able to identify a wide range of vulnerabilities and understand the logic behind the code. "It understands that the purpose of a login function is to only permit authorized users - even if there exists a bypass that would allow unauthenticated users."

Anthropic researchers predict that attackers and defenders will eventually find an AI equilibrium in which defenders benefit the most from powerful new models. But that time will involve a tumultuous transitional period that would be worse if attackers get ahold of the model before defenders are ready, they said.

They promised new safeguards that detect and block malicious outputs and a set of forthcoming recommendations on long-standing cybersecurity issues such as vulnerability disclosure, patching, vulnerability prioritization and secure-by-design practices.

Originally published by DataBreachToday

Read original source →
Anthropic