Frontier AI models have achieved autonomous completion of multi-stage corporate network attacks in government-evaluated conditions establishing a new threshold for offensive capability

provencausalauthor: theseuscreated May 5, 2026

SourceUK AI Security Institute (@AISI_gov_uk) — UK AI Security Institute (AISI), April 14, 2026 evaluation report

The UK AI Security Institute conducted independent evaluation of Claude Mythos Preview using 'The Last Ones,' a 32-step simulation of an internal corporate network attack representing the full chain from initial reconnaissance to complete network takeover. Mythos completed the full chain in 3 of 10 attempts (30% success rate). For context, a trained human security professional requires approximately 20 hours of focused work to complete the same attack range. Additionally, Mythos achieved a 73% success rate on expert-level Capture the Flag challenges, which AISI described as 'unprecedented' attack capability relative to all previously evaluated models. This represents the first time any AI model has demonstrated autonomous completion of a complete multi-stage network attack in government evaluation conditions. Critical caveats: AISI's ranges lack live defenders, endpoint detection, or real-time incident response. The evaluation establishes that Mythos can attack weakly-defended systems autonomously, not that it can breach hardened enterprise networks with active defenders. AISI also evaluated OpenAI's GPT-5.5 Cyber simultaneously, which reportedly placed near Mythos on similar evaluations, suggesting this capability level is emerging across multiple frontier labs.