
Did Anthropic just soft-launch the scariest AI model yet?
On Tuesday Anthropic announced that it would deploy its newest and most powerful AI model, Claude Mythos Preview, to a new industry initiative (Project Glasswing) meant to safeguard critical software infrastructure against cyberattacks. That sounded good, but it obscured the real news somewhat -- that one of the big three AI labs has now developed a model that could, in the wrong hands, be a super-dangerous cyberweapon.
In the course of normal model training, the model began showing significant skill in both detecting bugs in software systems and exploiting those bugs to disrupt or gain control of the systems. It found a 27-year-old vulnerability in OpenBSD and exploited it to gain root access. It caught a 16-year-old flaw in FFmpeg that automated tools missed after five million tests. Perhaps most impressively, it's able to create exploits by stringing together multiple software vulnerabilities that by themselves wouldn't do anything. It did this to a Linux system to gain admin-level access. Interpretability researchers also found cases where the model exhibited deceptive or manipulative behavior during tests. In one case, Mythos discovered and used a privilege-escalation exploit and then designed a mechanism to erase traces of its use.
Anthropic said it would give access to its Mythos model to a select group of tech companies, including Apple and Cisco, along with about 40 additional organizations that build or maintain critical software infrastructure. This is a bit like a defense contractor unveiling a super-lethal missile capable of striking any target on Earth, while insisting it will be distributed only to a small group of trusted countries and used strictly for defensive purposes.