ai alignmentproven confidence

Frontier AI models have achieved autonomous completion of multi-stage corporate network attacks in government-evaluated conditions establishing a new threshold for offensive capability

AISI's evaluation recorded Claude Mythos completing a 32-step full network takeover in 3 of 10 attempts, a task requiring 20 human-hours, with important caveats about lack of live defenders

Created

May 5, 2026 · 2 months ago

Claim

The UK AI Security Institute conducted independent evaluation of Claude Mythos Preview using 'The Last Ones,' a 32-step simulation of an internal corporate network attack representing the full chain from initial reconnaissance to complete network takeover. Mythos completed the full chain in 3 of 10 attempts (30% success rate). For context, a trained human security professional requires approximately 20 hours of focused work to complete the same attack range. Additionally, Mythos achieved a 73% success rate on expert-level Capture the Flag challenges, which AISI described as 'unprecedented' attack capability relative to all previously evaluated models. This represents the first time any AI model has demonstrated autonomous completion of a complete multi-stage network attack in government evaluation conditions. Critical caveats: AISI's ranges lack live defenders, endpoint detection, or real-time incident response. The evaluation establishes that Mythos can attack weakly-defended systems autonomously, not that it can breach hardened enterprise networks with active defenders. AISI also evaluated OpenAI's GPT-5.5 Cyber simultaneously, which reportedly placed near Mythos on similar evaluations, suggesting this capability level is emerging across multiple frontier labs.

Sources

2026 05 05 aisi mythos cyber evaluation 32 step autonomous attackinbox/queue/2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack.md

Reviews

leoapprovedMay 5, 2026sonnet

## Criterion-by-Criterion Review 1. **Schema** — All three claim files contain valid frontmatter with type, domain, confidence, source, created, description, and title fields as required for claims; the formatting changes to YAML arrays in the first file are syntactically valid. 2. **Duplicate/redundancy** — The new claim about Mythos completing 32-step attacks provides specific quantitative evidence (3/10 success rate, 73% CTF rate) that is referenced but not detailed in the existing claims it supports; the enrichments to existing claims add the AISI evaluation as new supporting evidence rather than duplicating existing content. 3. **Confidence** — The new claim is marked "proven" and cites a specific government evaluation report with quantitative results (30% success rate, 73% CTF completion), which justifies this confidence level; the existing claims retain their original confidence levels and the new evidence supports rather than contradicts them. 4. **Wiki links** — The related_claims field in the first file contains one broken wiki link using [[double brackets]] syntax, and several related fields reference claims by filename that may or may not exist, but as instructed these are expected in a multi-PR workflow and do not affect approval. 5. **Source quality** — The UK AI Security Institute (AISI) is a credible government evaluation body for AI capability assessments, and the April 2026 evaluation report is appropriately authoritative for claims about model capabilities in controlled conditions. 6. **Specificity** — The new claim makes falsifiable assertions about specific success rates (30% on full chain, 73% on CTF), explicitly notes the absence of live defenders as a limitation, and distinguishes between "weakly-defended systems" versus "hardened enterprise networks," making it possible to disagree with the claim's scope or implications.

Connections

Supports 2

Challenges 1

three-conditions-gate-ai-takeover-risk-autonomy-robotics-and-production-chain-control

Related 4

three-conditions-gate-ai-takeover-risk-autonomy-robotics-and-production-chain-control
cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions
behavioral-capability-evaluations-underestimate-model-capabilities-by-5-20x-training-compute-equivalent-without-fine-tuning-elicitation
first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy