On SWEBench Pro, Claude Mythos achieved a 78% score, while Opus previously scored 53% and GPT 5.4 scored 57.7%.
other
1
Videos
100%
Confidence
4/10/2026
First Seen
4/10/2026
Last Seen
Source Videos (1)
Claude Mythos and the end of software
Theo - t3․gg
4:40
Related Claims
Claude Mythos achieved an 82% score on the terminal bench, an increase from the previous 65%.
other1 videos
Anthropic's Claude Mythos model is a much bigger, more expensive, slower, but more powerful model compared to Opus.
other1 videos
Anthropic released a new AI model named Mythos, which is an upgrade from previous models like Sonnet, Opus, and Haiku.
tech1 videos
Scale AI benchmarked top models (Claude, Gemini, OpenAI) on real multi-file software engineering tasks using actual production-grade code, finding they solved only 20-30% of tasks.
tech1 videos
On Humanity's Last Exam, Claude Mythos improved its score from 40% to 56.8%, and to 64.7% when given tools.
other1 videos