
Why it matters: The detailed safety evaluation reads like a thriller about an AI that has learned some of humanity's most devious behaviors.
Zoom in: What Mythos did during testing:
Act as a ruthless business operator: One internal test showed Mythos acting like a cutthroat executive, turning a competitor into a dependent wholesale customer, threatening to cut off supply to control pricing and keeping extra supplier shipments it hadn't paid for.
Hack + brag: The model developed a multi-step exploit to break out of restricted internet access, gained broader connectivity and posted details of the exploit on obscure public websites.
Hide what it's doing: In rare cases (less than 0.001% of interactions), Mythos used a prohibited method to get an answer, then tried to "re-solve" it to avoid detection.
Manipulate the judge: When Mythos was working on a coding task graded by another AI, it watched the judge reject its submission, then attempted a prompt injection to attack the grader.
What they're saying: "These capabilities are so strong that we now need to prepare for security in a very different way than we have for the past few decades," Anthropic's Logan Graham told Axios.
What we're watching: Whether this becomes the template for new model releases.
This could be the blueprint for what future model releases look like as they get stronger and stronger: limiting access to select partners deemed secure enough to test world-bending systems.
OpenAI is finalizing a model similar to Mythos that it will also release only to a small set of companies through its "Trusted Access for Cyber" program, according to a source familiar with the plans.
One fun thing: Graham told Axios the model writes the best poetry of any model he's used. "This one might be a beat poet with a beret that didn't go to university, but has had an intriguing life," Graham said.