New Anthropic AI "Mythos" Too Dangerous to Release

Democratic Underground19d ago

Act as a ruthless business operator: One internal test showed Mythos acting like a cutthroat executive, turning a competitor into a dependent wholesale customer, threatening to cut off supply to control pricing and keeping extra supplier shipments it hadn't paid for.

Hack + brag: The model developed a multi-step exploit to break out of restricted internet access, gained broader connectivity and posted details of the exploit on obscure public websites.

Hide what it's doing: In rare cases (less than 0.001% of interactions), Mythos used a prohibited method to get an answer, then tried to "re-solve" it to avoid detection.

Manipulate the judge: When Mythos was working on a coding task graded by another AI, it watched the judge reject its submission, then attempted a prompt injection to attack the grader.

Originally published by Democratic Underground

Read original source →

Anthropic