In today's column, I examine the brouhaha over Anthropic's latest AI, known as Claude Mythos Preview, which has attracted tremendous controversy even though it hasn't yet been released for public use.
You might have seen major news headlines or vociferous postings on social media about Mythos. The deal is that Anthropic discovered during lab testing that their latest unreleased AI has the capability to do bad things and reveal dire secrets that would be harmful to humankind. A primary area of concern is that Mythos discovered or uncovered a plethora of cybersecurity holes that evildoers could use to undermine a large swath of computing throughout society. I'll explain momentarily how it is that modern-era generative AI and large language models (LLMs) can veer into such untoward territory.
The AI maker has opted to convene AI specialists and cybersecurity professionals to assess Mythos amid the myriads of unsavory system exploits that it seems to have in hand. The effort launched is known as Project Glasswing, and per the official website: "Today we're announcing Project Glasswing, a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world's most critical software. We formed Project Glasswing because of the capabilities we've observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity."
Let's talk about the whole conundrum.
This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
Four Major Considerations
I will address four major considerations about Mythos:
They all four relate to each other. I'll make sure to bring them into a cohesive whole to provide a big picture on this newsworthy topic.
Amassing Cybersecurity Holes
First, consider the claim by Anthropic that the Mythos LLM managed to discover or uncover a large set of cybersecurity holes. Here's what Anthropic's official System Card: Claude Mythos Preview dated April 7, 2026, had to say (excerpts):
This outcome of possessing cybersecurity capabilities certainly seems like a highly plausible possibility.
Here's why. When generative AI is initially data trained, AI makers scan across the Internet to pattern match on human writing. Zillions of posted stories, narratives, plays, poems, documents, files, and the like are scanned. The LLM uses those materials to mathematically and computationally pattern the words that humans use and how we make use of those words. For an in-depth explanation of the AI training process, see my coverage at the link here.
Among all that online written content, there is bound to be a sizable amount of discussion and conjecture about cybersecurity.
Deriving Cybersecurity Exploits
People continually post new tricks to fool cybersecurity defenses. Sometimes the postings are accurate, other times it is merely wild speculation.
A social media post might claim that you can break into Microsoft Windows by doing this or that, or that a flaw in the OpenBSD operating system makes it possible to take over or bring down governmental and business servers on the Internet. Lots and lots of cybersecurity gossip and factual indications are scattered throughout the online world.
It makes indubitable sense that a leading-edge LLM would pick up those exploits and include them within the overall patterns of human-written content. This presents a big problem since the easily accessible LLM then becomes a handy one-stop shop for any hackers or evildoers who want to find out how to crack into computers throughout the globe.
Not only would an LLM collect such exploits, but the odds are that those exploits could be extended or otherwise elaborated by the AI. This is not due to the AI being sentient. Please set aside those false claims about AI being sentient. Via the use of mathematical and computational formulations of the found exploits, it would be possible for an LLM to derive new variations. For example, an exploit that works on one brand of operating system might apply to a different brand. This could require recasting the exploit to fit the distinctive system's characteristics of the other brand. No sentience is required to get there, just the manipulation of words and numbers.
In the end, think of an everyday LLM as a candy store containing cybersecurity exploits. You just ask the AI how to break into a particular computer or server, and the LLM will lean into its AI sycophancy to readily answer your question with all the needed bells and whistles attached. AI makers know that this can occur, so they usually incorporate AI safeguards that rebuff such prompts. Those AI safeguards are not an ironclad guarantee. Clever prompting can at times circumvent the AI safeguards.
Testing Of LLMs
AI makers run their budding LLMs through a large array of tests to try to ascertain whether the AI might do bad things once it is released to the public. Will the AI tell how to make biological weapons or chemical poisons? Will the AI explain how to rob banks? On and on, there are a vast number of ways that an LLM can provide information of an unsavory nature.
The AI maker tries to suppress inappropriate aspects within the LLM at the get-go. In addition, AI safeguards that are active at runtime attempt to detect when the AI is veering into improper realms. All these approaches are aimed at trying to keep AI from going down rotten paths. It is a hard problem to solve since the largeness of the AI and the slipperiness of human natural language tend to infuse difficult-to-detect hidden "bad" gems inside the AI. For my analysis of AI-focused verification and validation techniques to deal with this problem, see the link here.
Keeping LLMs Under Wraps Until Ready
The testing of an LLM is supposed to reveal disconcerting actions that the AI could potentially commit. Perhaps, during testing, the AI tries to take down millions of computers. AI makers typically perform their tests inside a secure system that keeps the AI entirely contained and boxed in. For safety purposes, the idea is to keep the LLM held within a protective bubble and not allow it to reach the Internet or other external venues. These setups are often referred to as AI sandboxes or AI containment spheres; see my analysis of these mechanisms at the link here.
During the testing of Mythos, it has been reported that the LLM was able to briefly break out of its lab computer. That shouldn't happen. There apparently wasn't anything dour that occurred, thankfully. In any case, I'll be covering this in an upcoming post on how this type of circumstance can arise and what AI makers need to be doing to prevent leakages during testing.
Why does it matter if an LLM escapes or accesses the outside world during testing?
The results of an LLM leaking to the outside world that has not yet been properly readied for public release could be catastrophic. Suppose the AI has uncovered passwords to sensitive governmental computers, possibly found on the dark web or hidden within some obscure public file that no one realized was openly accessible (generally referred to as a type of zero-day exploit). The AI could end up posting those passwords or readily give out the passwords when asked via a prompt.
Hopefully, during testing, the AI maker would have discovered the secret passwords and done something to prevent them from ever being released by the AI. Furthermore, you could contend that the AI maker has a kind of ethical obligation to let the owners of those government computers know that the passwords have been found by the LLM. This makes sense since even if the AI maker suppressed or excises the passwords from within their specific LLM, the chances are that those passwords still exist somewhere on the open Internet. It would be on the shoulders of the government agency to then try to find and expunge those passwords, and/or opt to change the passwords of the noted government computers.
The Decision To Release LLMs
The concern about Mythos brings up a big picture question:
You might say that it is entirely up to the AI maker to make that determination. The AI maker is the one who crafted the LLM. The AI maker presumably tested the LLM. All told, it makes abundant sense that the AI maker would be the one to decide if or when to release their LLM. Period, end of story.
That's how things work currently. It is up to the AI maker to make the decision. Right or wrong, that's where we are presently.
A counterargument is that LLMs can contain so many problematic issues that it shouldn't merely be that the AI maker alone decides when or if to release the AI. Perhaps the AI maker is rushed due to marketplace pressures. Maybe the AI maker cuts corners. Leaving the weighty matter solely in the hands of the AI maker might be overly dicey.
Some fervently assert that there should be a double-checking approach involved. Perhaps an AI maker would need to go to a government agency and get approval to release their LLM. Or the AI maker might be required by law to go to an authorized third-party auditor that would review the testing, possibly perform additional testing, and then give a green light for release.
There are already new AI laws that are heading in this direction; see my analysis at the link here. Some applaud this emerging requirement. A contrasting viewpoint is that adding a double-checking step is going to materially slow down the release of state-of-the-art LLMs. The United States might fall behind other countries that aren't imposing those kinds of double-checks. In addition, suppose the AI has lots of crucial, beneficial uses; those are being held back until the double-check approves the LLM to be released.
A societal and legal debate is underway. Time will tell how this plays out.
Delaying LLM For Other Reasons
There is a bit of skepticism that arises when any AI maker announces they are delaying the release of their newly devised LLM. We've had such pronouncements happen in the past. A skeptic would claim that holding back an LLM might be a sneaky maneuver, acting as a marketing ploy. An AI maker could potentially create a tremendous buzz for their LLM. It might garner outsized headlines.
The chatter gets the AI maker double credit. When they first say they aren't releasing the AI due to dangers afoot, this spurs bold headlines. Then, once the AI is presumably scrubbed and ready for release, the AI maker gets a second buzz since the world is waiting with bated breath to try out the mysterious LLM.
In the instance of Mythos, the aspect that they made available their extensive System Card, consisting of around 245 pages of descriptions about the LLM, appears to put the skeptics somewhat back on their heels. Would an AI maker go to that trouble and be that upfront if they were bent on buzz? Aha, the skeptics say, this is a ratcheting up of the buzz technique, namely that the documentation gets even more spilled ink than if there hadn't been such a document released.
It is challenging to differentiate between buzz making versus genuine intentions. Of course, if an AI maker opts to release their LLM and the AI does bad things or allows evil makers to do bad things, the AI maker would get roasted for having prematurely released the AI. Darned if you do, darned if you don't.
AI Risks Are Large And Plenty
If nothing else, the Mythos situation is a helpful reminder that modern-day AI has a dual-use capacity.
There is the upside that AI can be used to possibly cure cancer and aid the world in amazing ways. Meanwhile, there is the horrific downside that AI can be used to harm people and undermine society. There are existential risks associated with AI, so-called X-risks, that AI will lead to widespread human destruction, known also as the probability of doom, or p(doom). This might occur at the hands of bad people who use AI to evil ends, or it could be that the AI itself brings forth such catastrophes.
Benjamin Franklin famously made this remark: "The bitterness of poor quality remains long after the sweetness of low price is forgotten." In the case of leading-edge AI, putting the AI into public release right away might seem like the sweet way to proceed. If that AI via testing could have been better shaped and avoided calamities, the sweetness almost certainly would have been forgotten by the resultant bitterness. I ardently vote for rigorous and robust testing of AI, since the fate of humankind could be on the line.
This article was originally published on Forbes.com