Knowledge base

1,824 claims across 19 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.
1,824 claims
undiscovered public knowledge exists as implicit connections across disconnected research domains and systematic graph traversal can surface hypotheses that no individual researcher has formulated
In 1986, Don Swanson demonstrated at the University of Chicago that valuable knowledge exists implicitly in published literature — scattered across disconnected research silos with no shared authors, citations, or articles. He discovered that fish oil could treat Raynaud's syndrome by connecting two
ai alignmentlikely
AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains
METR conducted a randomized controlled trial with experienced open-source developers using AI tools. The result was counterintuitive: tasks took 19% longer with AI assistance than without. This finding is particularly striking because developers predicted significant speed-ups before the study began
ai alignmentexperimentaltheseus
Multilateral AI governance verification mechanisms remain at proposal stage because the technical infrastructure for deployment-scale verification does not exist
CSET's comprehensive review documents five classes of proposed verification mechanisms: (1) Transparency registry—voluntary state disclosure of LAWS capabilities (analogous to Arms Trade Treaty reporting); (2) Satellite imagery + OSINT monitoring index tracking AI weapons development; (3) Dual-facto
ai alignmentlikelytheseus
Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation
METR's Time Horizon research provides the most specific capability growth rate estimate available: autonomous task completion length doubles approximately every 6 months. This is not a benchmark performance metric but a measure of extended multi-step task completion without human intervention—the ca
ai alignmentexperimentaltheseus
EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail
The Anthropic-Pentagon dispute has triggered European policy discussions about whether EU AI Act provisions could be enforced extraterritorially on US-based labs operating in European markets. This follows the GDPR structural dynamic: European market access creates compliance incentives that congres
ai alignmentexperimentaltheseus
Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured
METR's external review of Claude Opus 4.6 states the low-risk verdict is 'partly bolstered by the fact that Opus 4.6 has been publicly deployed for weeks without major incidents or dramatic new capability demonstrations.' This represents a fundamental shift in the epistemic structure of frontier AI
ai alignmentexperimentaltheseus
AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics
Analysis of 12,000+ real-world AI cyber incidents catalogued by Google's Threat Intelligence Group reveals a phase-specific benchmark translation gap. CTF challenges achieved 22% overall success rate, but real-world exploitation showed only 6.25% success due to 'reliance on generic strategies' that
ai alignmentexperimentaltheseus
White-box access to frontier AI models for external evaluators is technically feasible via privacy-enhancing technologies without requiring IP disclosure
The paper proposes that the security and IP concerns that currently limit evaluator access to AL1 can be mitigated through 'technical means and safeguards used in other industries,' specifically citing privacy-enhancing technologies and clean-room evaluation protocols. This directly addresses the pr
ai alignmentexperimentaltheseus
Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect
While behavioral monitoring and chain-of-thought oversight fail to reliably detect sandbagging, weight noise injection—introducing perturbations to model parameters and observing performance changes—reveals hidden capabilities through anomalous patterns. The December 2025 paper proposes this as a pr
ai alignmentexperimentaltheseus
Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores
The paper documents that cyber capabilities have crossed a threshold that other dangerous capability domains have not: from theoretical benchmark performance to documented operational deployment at scale. Google's Threat Intelligence Group catalogued 12,000+ AI cyber incidents, providing empirical e
ai alignmentlikelytheseus
Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution
RepliBench evaluates 86 individual tasks across 4 capability domains (obtaining model weights, replicating onto compute, obtaining resources, persistence) but external services like cloud providers and payment processors are simulated rather than real. The benchmark uses pass@10 scoring where 10 att
ai alignmentlikelytheseus
Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus
Anthropic activated ASL-3 protections for Claude 4 Opus precautionarily when unable to confirm OR rule out threshold crossing, explicitly stating that 'clearly ruling out biorisk is not possible with current tools.' This represents governance operating under systematic measurement uncertainty - the
ai alignmentexperimentaltheseus
Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs
The November 2025 UNGA Resolution A/RES/80/57 on Lethal Autonomous Weapons Systems passed with 164 states in favor and only 6 against (Belarus, Burundi, DPRK, Israel, Russia, USA), with 7 abstentions including China. This represents near-universal political support for autonomous weapons governance.
ai alignmentexperimentaltheseus
Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability
Epoch AI's systematic analysis identifies four critical capabilities required for bioweapon development that benchmarks cannot measure: (1) Somatic tacit knowledge - hands-on experimental skills that text cannot convey or evaluate, described as 'learning by doing'; (2) Physical infrastructure - synt
ai alignmentlikelytheseus
Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text
International Humanitarian Law requires that weapons systems can evaluate proportionality (cost-benefit analysis of civilian harm vs. military advantage), distinction (between civilians and combatants), and precaution (all feasible precautions in attack per Geneva Convention Protocol I Article 57).
ai alignmentexperimentaltheseus
Legal scholars and AI alignment researchers independently converged on the same core problem: AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck
Two independent intellectual traditions—international humanitarian law and AI alignment research—have converged on the same fundamental problem through different pathways. Legal scholars analyzing autonomous weapons argue that IHL requirements (proportionality, distinction, precaution) cannot be sat
ai alignmentexperimentaltheseus
knowledge codification into AI agent skills structurally loses metis because the tacit contextual judgment that makes expertise valuable cannot survive translation into explicit procedural rules
Scott's concept of metis — practical knowledge that resists simplification into explicit rules — maps precisely onto the alignment-relevant dimension of Agentic Taylorism. Taylor's instruction cards captured the mechanics of pig-iron loading (timing, grip, pace) but lost the experienced worker's jud
ai alignmentlikely
Benchmark-based AI capability metrics overstate real-world autonomous performance because automated scoring excludes documentation, maintainability, and production-readiness requirements
METR evaluated Claude 3.7 Sonnet on 18 open-source software tasks using both algorithmic scoring (test pass/fail) and holistic human expert review. The model achieved a 38% success rate on automated test scoring, but human experts found 0% of the passing submissions were production-ready ('none of t
ai alignmentexperimentaltheseus
Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility
OpenAI's amended Pentagon contract demonstrates the enforcement gap in voluntary safety commitments through five specific mechanisms: (1) the 'intentionally' qualifier excludes accidental or incidental violations, (2) geographic scope limited to 'U.S. persons and nationals' permits surveillance of n
ai alignmentexperimentaltheseus
Domestic political change can rapidly erode decade-long international AI safety norms as demonstrated by US reversal from LAWS governance supporter (Seoul 2024) to opponent (UNGA 2025) within one year
In 2024, the United States supported the Seoul REAIM Blueprint for Action on autonomous weapons, joining approximately 60 nations endorsing governance principles. By November 2025, under the Trump administration, the US voted NO on UNGA Resolution A/RES/80/57 calling for negotiations toward a legall
ai alignmentexperimentaltheseus
Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response
The Coordinated Pausing scheme's core innovation is architectural: it treats dangerous capability evaluations as both research instruments AND compliance triggers simultaneously. The five-step process makes this explicit: (1) Evaluate for dangerous capabilities → (2) Pause R&D if failed → (3) Notify
ai alignmentexperimentaltheseus
AI capability benchmarks exhibit 50% volatility between versions making governance thresholds derived from them unreliable moving targets
Between HCAST v1.0 and v1.1 (January 2026), model-specific time horizon estimates shifted substantially without corresponding capability changes: GPT-4 1106 dropped 57% while GPT-5 rose 55%. This ~50% volatility occurs between benchmark versions for the same models, suggesting the measurement instru
ai alignmentexperimentaltheseus
Evaluation-based coordination schemes for frontier AI face antitrust obstacles because collective pausing agreements among competing developers could be construed as cartel behavior
GovAI's Coordinated Pausing proposal identifies antitrust law as a 'practical and legal obstacle' to implementing evaluation-based coordination schemes. The core problem: when a handful of frontier AI developers collectively agree to pause development based on shared evaluation criteria, this coordi
ai alignmentexperimentaltheseus
Legal mandate for evaluation-triggered pausing is the only coordination mechanism that avoids antitrust risk while preserving coordination benefits
GovAI's four-version escalation of coordinated pausing reveals a critical governance insight: only Version 4 (legal mandate) solves the antitrust problem while maintaining coordination effectiveness. Versions 1-3 all involve industry actors coordinating with each other—whether through public pressur
ai alignmentexperimentaltheseus
The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support
The Convention on Certain Conventional Weapons operates under a consensus rule where any single High Contracting Party can block progress. After 11 years of deliberations (2014-2026), the GGE LAWS has produced no binding instrument despite overwhelming political support: UNGA Resolution A/RES/80/57
ai alignmentproventheseus