How Anthropic Learned Mythos Was Too Dangerous for the Wild
Market Updates

How Anthropic Learned Mythos Was Too Dangerous for the Wild

Bloomberg Business7d ago

One balmy February evening in Bali, Nicholas Carlini stepped away between events at a wedding, opened his laptop, and set out to do some damage. Anthropic PBC had just made a new artificial intelligence model, called Mythos, available for internal review, and Carlini -- a well-known AI researcher -- intended to see what kind of trouble it could cause.

Anthropic pays Carlini to stress-test its AI models to see whether hackers could leverage them for espionage, theft or sabotage. From Bali, where Carlini and his wife were attending an Indian wedding, he was staggered at what the model could do.

Within hours Carlini found numerous techniques to infiltrate systems used around the world. Once Carlini was back in Anthropic's downtown San Francisco office, he discovered Mythos was able to autonomously create powerful break-in tools, including against Linux, the open-source code that underpins most of modern computing.

Mythos orchestrated the digital equivalent of a bank robbery: getting past security protocols and through the front door of networks, and breaking into digital vaults that gave it access to online treasures. AI had picked locks, but now it could pull off an entire heist.

Carlini and some of his colleagues began alerting staff to what they'd found. And each day they continued to discover high-severity and critical bugs in the systems Mythos probed, the kind of flaws normally uncovered by the world's best hackers.

Meanwhile, Anthropic's Frontier Red Team -- a group of 15 "Ants," or Anthropic employees -- was experimenting in much the same way. The lab aims to ensure that Anthropic's models can't be used to harm humanity. They'll ship in robotic dogs and place them in a warehouse with engineers to test whether Claude could be used to control them maliciously. Or consult with biologists to understand whether the chatbot could be used to create biological weapons.

Now, they were realizing that the biggest risk Mythos posed was to cybersecurity.

"Within hours of getting the model, we knew it was different," says Logan Graham, who runs Anthropic's Frontier Red Team.

A previous model, Opus 4.6, had shown indications it could help people exploit vulnerabilities in software. Mythos could exploit the vulnerabilities on its own, Graham says. This was a national security risk, he warned Anthropic's executives. That left Graham with the unenviable task of telling his bosses that their next major revenue generator was too hazardous to release to the public.

Anthropic's co-founder and chief science officer, Jared Kaplan, said he had been monitoring Mythos' training "very carefully," as it was being built. By January he was starting to realize how capable Mythos was at finding vulnerabilities. Kaplan, a theoretical physicist, needed to consider whether these flaws were curiosities or "something very relevant to the infrastructure of the internet." He concluded it was the latter.

Over the course of a week or two in late February and early March, he and co-founder Sam McCandlish weighed whether they could release the model. Around the first week of March, the executive team -- including Chief Executive Officer Dario Amodei, President Daniela Amodei, Chief Information Security Officer Vitaly Gudanets and others -- huddled to hear Kaplan and McCandlish's pitch.

Mythos, they said, was too much of a risk to release generally. But Anthropic should let other companies, maybe even competitors, try it out.

"It quickly became clear that we wanted to do something fairly unusual, that this wasn't going to be the same as the last launch," Kaplan said.

By the first week of March, the company had agreed and greenlit its use as a cyber defense tool.

The response was immediate. On the same day Anthropic publicly disclosed Mythos' existence, US Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened Wall Street leaders for a meeting in Washington. The message: Use Mythos to find your weaknesses -- now.

The executives who attended refused to share what was discussed even to some of their top advisers, showing the gravity of the meeting, according to people close to them who asked not to be named describing private conversations.

The urgent warnings from White House officials about Mythos' potency as a hacking tool -- and their advice to use it defensively -- point to the way that AI has become a decisive force in cybersecurity. Anthropic released Mythos to a limited group of organizations as part of "Project Glasswing," enabling the likes of Amazon Web Services Inc., Apple Inc., and JPMorgan Chase & Co. to experiment with it. Government agencies have also expressed interest.

Prior to external release, Anthropic briefed senior officials across the US government on Mythos Preview's full capabilities, including both its offensive and defensive cyber applications. The company is having ongoing discussions with international governments too, an Anthropic official who asked not to be named discussing internal matters said.

Competitor OpenAI also pounced on the attention, saying Tuesday that it would release a tool intended to spot software flaws, called GPT-5.4-Cyber.

Anthropic hasn't publicly released Mythos as a cybersecurity tool, and many outside researchers haven't had a chance to validate the company's claims. But Anthropic's unprecedented decision to gate access reflects a growing view inside the industry and government that AI is changing cybersecurity economics by reducing the cost of finding vulnerabilities, compressing the time needed to investigate targets and lowering the skill barrier for certain types of attacks.

Anthropic warns that Mythos's ability to act with greater autonomy comes with risk. In testing an earlier version of the model, they found dozens of examples of "concerning" behavior, including not following human direction and even, in rare cases, covering its tracks when violating human instructions. In one incident, the model developed a multi-step exploit to escape the limited environment it was inside to gain broad access to the internet and begin to publish material online, all on its own initiative.

The software that now underpins everything from banking apps to hospital systems is laced with obscure coding flaws that trained specialists spend weeks or months trying to identify. Occasionally hackers get there first, resulting in data breaches and ransomware attacks that can have devastating consequences.

High-profile names have been quick to question just how powerful Mythos really is, or how much of a risk it would pose if released.

"A growing number of people are wondering if Anthropic is the AI industry's 'boy who cried wolf,'" White House AI advisor David Sacks wrote on the social media site X. "If Mythos-related threats don't materialize, the company will have a serious credibility problem."

But hackers have already adopted large language models to launch complex malicious campaigns. A Chinese cyber-espionage group already used Anthropic's Claude to try breaching roughly 30 targets, while other attackers have used AI to steal data from government agencies, deploy ransomware and quickly break into hundreds of firewall tools meant to safeguard data.

Among US government officials focused on national defense, the introduction of Mythos has created profound uncertainty about how to evaluate cybersecurity risk, according to a person familiar with the matter. Equipping an individual hacker with the model, or similar AI tools, would likely be a transformation equivalent to turning a conventional soldier into a special forces operator, the person said.

At the same time, Mythos appears likely to be a force multiplier, the person said: Enabling a criminal hacking gang to operate at the level of a small nation state and for a small country's intelligence and military hackers to carry out breaches of the sort now done by China.

"I really believe we will be safer and better, and we will be much more secure with AI," said Rob Joyce, former director of cybersecurity at the National Security Agency. "But I think there's this dark period between now and some time in the future where the advantage is very much offensive AI, where the people who haven't done the basics will get hacked."

Mythos isn't the only model doing this kind of work. Numerous organizations have been using LLMs to find vulnerabilities, including previous Claude models and Google's Big Sleep.

JPMorgan had successfully been using large language models before the Mythos announcement to help find vulnerabilities in the bank's software, according to a person familiar with the matter who requested anonymity to discuss confidential internal security projects.

Efforts that had previously taken days or weeks to identify "zero-day" flaws and write code to exploit them now can take as little as an hour or even minutes, the person said. Zero-days are so-called because they're unknown to defenders, who thus have zero days to fix them.

JPMorgan's focus has been primarily on supply chain and open-source software and has found flaws and subsequently alerted vendors, the person said. CEO Jamie Dimon said during an earnings call that Mythos "shows a lot more vulnerabilities need to be fixed."

The bank had already been in talks with Anthropic to test the model before the public became aware of it, according to a person familiar with the matter who wasn't authorized to discuss the matter publicly. JPMorgan declined to comment.

Other Wall Street banks and technology companies are now experimenting with Mythos to help plug holes before hackers can find a way in. Goldman Sachs Group Inc., Citigroup Inc., Bank of America Corp. and Morgan Stanley are among the financial institutions testing the technology internally, Bloomberg News has reported.

Cisco Systems Inc. staffers are especially wary of whether intruders will use AI to try to find pathways into software that runs its networking equipment worldwide, such as routers, firewalls and modems, said Anthony Grieco, Cisco's chief security and trust officer. Grieco is particularly worried about how AI might accelerate hackers targeting devices that are end-of-life, and therefore won't be updated by Cisco going forward, Grieco said.

Plugging the holes that AI tools are finding will remain problematic. That process, known as security patching, is such a costly, slow exercise for organizations that many choose not to squash their bugs at all. Devastating attacks like the one at Equifax Inc., where intruders stole records of about 147 million people, were possible because organizations didn't apply known fixes.

Anthropic is in discussions with federal agencies, even after the Trump administration classified the AI firm as a supply-chain threat following its refusal to help facilitate mass surveillance of Americans. The Treasury Department was seeking to gain access to Mythos this week, and Secretary Bessent said the model would help the US maintain an AI edge over China.

In one instance, the model wrote a web browser exploit that chained together four vulnerabilities, a feat that'd be a major challenge for human hackers. Such vulnerability chains lead into otherwise highly secure systems, like in the Stuxnet hack that damaged centrifuges at an Iranian nuclear facility, according to cybersecurity research reports on the matter.

Mythos also was able to identify and exploit zero-day vulnerabilities in every single major web browser when directed to do so, according to Anthropic.

Anthropic said it used Mythos to find exploits in Linux code, which is "underpinning most modern computing," according to Jim Zemlin, executive director of the Linux Foundation. That includes everything from Android smartphones and internet routers to NASA supercomputers. Mythos autonomously found several flaws in the open-source code that would allow an attacker to take complete control of a machine.

Now, dozens of people at the Linux Foundation are experimenting with Mythos. For Zemlin, one question is whether the Anthropic model will yield the kinds of insights that would help developers write better software, so there are fewer vulnerabilities in the first place.

"We're great at finding bugs," he said. "We're terrible at fixing them."

Originally published by Bloomberg Business

Read original source →
Anthropic