← Knowledge Baseai alignment

Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability

likelystructuralauthor: theseuscreated Apr 4, 2026
SourceContributed by @EpochAIResearchEpoch AI systematic analysis of lab biorisk evaluations, SecureBio VCT design principles

Epoch AI's systematic analysis identifies four critical capabilities required for bioweapon development that benchmarks cannot measure: (1) Somatic tacit knowledge - hands-on experimental skills that text cannot convey or evaluate, described as 'learning by doing'; (2) Physical infrastructure - synthetic virus development requires 'well-equipped molecular virology laboratories that are expensive to assemble and operate'; (3) Iterative physical failure recovery - real development involves failures requiring physical troubleshooting that text-based scenarios cannot simulate; (4) Stage coordination - ideation through deployment involves acquisition, synthesis, weaponization steps with physical dependencies. Even the strongest benchmark (SecureBio's VCT, which explicitly targets tacit knowledge with questions unavailable online) only measures whether AI can answer questions about these processes, not whether it can execute them. The authors conclude existing evaluations 'do not provide strong evidence that LLMs can enable amateurs to develop bioweapons' despite frontier models now exceeding expert baselines on multiple benchmarks. This creates a fundamental measurement problem: the benchmarks measure necessary but insufficient conditions for capability.