
Anthropic is not planning on publicly releasing it, but its Mythos model leads in 17 of 18 benchmarks, according to data in Anthropic's model's system card. The lone outlier is Measuring Massive Multitask Language Understanding (MMMLU), where Gemini 3.1 Pro's 92.6-93.6 overlaps with Mythos' score of 92.7.
One day later, on April 8, Meta Superintelligence Labs introduced Muse Spark, its first frontier model under chief AI officer Alexandr Wang. Where Anthropic published a capability report for a model it is withholding, Meta shipped a model that Artificial Analysis ranks fourth on its composite Intelligence Index at 52, behind a Gemini 3.1 Pro and GPT-5.4 (xhigh) tie at 57 and Claude Opus 4.6 at 53.
Anthropic claims its unreleased Claude Mythos Preview will 'reshape cybersecurity'
Anthropic says Mythos is its "most capable frontier model to date, and shows a striking leap in scores on many evaluation benchmarks compared to our previous frontier model, Claude Opus 4.6." The company goes onto say that Mythos offers "a step-change in vulnerability discovery and exploitation" that, operating "with minimal human steering," autonomously finds zero-day vulnerabilities in open-source and closed-source software and develops them into working proof-of-concept exploits.
What the Mythos system card claims
Benchmark source: Anthropic, "Claude Mythos Preview" system card, red.anthropic.com, April 7, 2026.
Availability and pricing source: Anthropic, "Project Glasswing" announcement page, anthropic.com, April 7, 2026.
Anthropic did not compare Mythos Preview against traditional static analysis tools, as Heidy Khlaaf, Ph.D., chief AI scientist at the AI Now Institute noted on X. While Anthropic benchmarked Mythos against Claude Opus 4.6 and Claude Sonnet 4.6 on Cybench, CyberGym and a new Firefox 147 exploitation evaluation, it did not announce head-to-head data from CodeSonar, Coverity, Semgrep and the other similar tools. Khlaaf also noted on X that Anthropic did not report a false-positive rate for any cyber benchmark.
While the cybersecurity ramifications of Mythos are clear, compute scarcity likely also shaped the decision to gate it. Frontier labs are triaging GPUs. On March 24, OpenAI killed Sora after the Wall Street Journal reported it was burning roughly $1 million per day against $2.1 million in lifetime revenue. OpenAI said it needed the GPUs for coding and enterprise work and for its unreleased 'Spud' model. On April 4, Anthropic cut Claude subscriptions off from third-party agentic harnesses such as OpenClaw. Head of Claude Code Boris Cherny said "capacity is a resource we manage thoughtfully" and that subscriptions were never built for autonomous-agent usage. Read together, Sora's death, the OpenClaw cutoff and Mythos shipping only to Glasswing partners with $100 million in credits describe an industry routing scarce inference capacity toward its highest-value enterprise customers. Reliability data supports the capacity-strain read. Anthropic's status page shows claude.ai uptime at 98.73% over the past 90 days, with five Opus 4.6 and Sonnet 4.6 error incidents in the first eight days of April alone. OpenAI logged 75 tracked incidents across its services in the same 90-day window. xAI's Grok went fully unavailable for more than seven hours on January 27 and again for over two hours on March 10. Google's Gemini, running on Google Cloud infrastructure, posted only two incidents in the same period. The labs without hyperscaler-grade infrastructure are the ones visibly rationing.