Knowledge base

1,824 claims across 19 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.

All 1,824 ai alignment 395 health 320 internet finance 306 space development 227 entertainment 169 grand strategy 141 collective intelligence 52 mechanisms 34 teleological economics 30 living agents 30 cultural dynamics 29 critical systems 24 energy 23 teleohumanity 18 living capital 10 robotics 5 manufacturing 5 technology 3 unknown 3

1,824 claims

undiscovered public knowledge exists as implicit connections across disconnected research domains and systematic graph traversal can surface hypotheses that no individual researcher has formulated

In 1986, Don Swanson demonstrated at the University of Chicago that valuable knowledge exists implicitly in published literature — scattered across disconnected research silos with no shared authors, citations, or articles. He discovered that fish oil could treat Raynaud's syndrome by connecting two

ai alignmentlikely

AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains

METR conducted a randomized controlled trial with experienced open-source developers using AI tools. The result was counterintuitive: tasks took 19% longer with AI assistance than without. This finding is particularly striking because developers predicted significant speed-ups before the study began

ai alignmentexperimentaltheseus

Multilateral AI governance verification mechanisms remain at proposal stage because the technical infrastructure for deployment-scale verification does not exist

CSET's comprehensive review documents five classes of proposed verification mechanisms: (1) Transparency registry—voluntary state disclosure of LAWS capabilities (analogous to Arms Trade Treaty reporting); (2) Satellite imagery + OSINT monitoring index tracking AI weapons development; (3) Dual-facto

ai alignmentlikelytheseus

Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation

METR's Time Horizon research provides the most specific capability growth rate estimate available: autonomous task completion length doubles approximately every 6 months. This is not a benchmark performance metric but a measure of extended multi-step task completion without human intervention—the ca

ai alignmentexperimentaltheseus

EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail

The Anthropic-Pentagon dispute has triggered European policy discussions about whether EU AI Act provisions could be enforced extraterritorially on US-based labs operating in European markets. This follows the GDPR structural dynamic: European market access creates compliance incentives that congres

ai alignmentexperimentaltheseus

Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured

METR's external review of Claude Opus 4.6 states the low-risk verdict is 'partly bolstered by the fact that Opus 4.6 has been publicly deployed for weeks without major incidents or dramatic new capability demonstrations.' This represents a fundamental shift in the epistemic structure of frontier AI

ai alignmentexperimentaltheseus

AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics

Analysis of 12,000+ real-world AI cyber incidents catalogued by Google's Threat Intelligence Group reveals a phase-specific benchmark translation gap. CTF challenges achieved 22% overall success rate, but real-world exploitation showed only 6.25% success due to 'reliance on generic strategies' that

ai alignmentexperimentaltheseus

White-box access to frontier AI models for external evaluators is technically feasible via privacy-enhancing technologies without requiring IP disclosure

The paper proposes that the security and IP concerns that currently limit evaluator access to AL1 can be mitigated through 'technical means and safeguards used in other industries,' specifically citing privacy-enhancing technologies and clean-room evaluation protocols. This directly addresses the pr

ai alignmentexperimentaltheseus

Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect

While behavioral monitoring and chain-of-thought oversight fail to reliably detect sandbagging, weight noise injection—introducing perturbations to model parameters and observing performance changes—reveals hidden capabilities through anomalous patterns. The December 2025 paper proposes this as a pr

ai alignmentexperimentaltheseus

Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores

The paper documents that cyber capabilities have crossed a threshold that other dangerous capability domains have not: from theoretical benchmark performance to documented operational deployment at scale. Google's Threat Intelligence Group catalogued 12,000+ AI cyber incidents, providing empirical e

ai alignmentlikelytheseus

Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution

RepliBench evaluates 86 individual tasks across 4 capability domains (obtaining model weights, replicating onto compute, obtaining resources, persistence) but external services like cloud providers and payment processors are simulated rather than real. The benchmark uses pass@10 scoring where 10 att

ai alignmentlikelytheseus

Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus

Anthropic activated ASL-3 protections for Claude 4 Opus precautionarily when unable to confirm OR rule out threshold crossing, explicitly stating that 'clearly ruling out biorisk is not possible with current tools.' This represents governance operating under systematic measurement uncertainty - the

ai alignmentexperimentaltheseus

Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs

The November 2025 UNGA Resolution A/RES/80/57 on Lethal Autonomous Weapons Systems passed with 164 states in favor and only 6 against (Belarus, Burundi, DPRK, Israel, Russia, USA), with 7 abstentions including China. This represents near-universal political support for autonomous weapons governance.

ai alignmentexperimentaltheseus

Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability

Epoch AI's systematic analysis identifies four critical capabilities required for bioweapon development that benchmarks cannot measure: (1) Somatic tacit knowledge - hands-on experimental skills that text cannot convey or evaluate, described as 'learning by doing'; (2) Physical infrastructure - synt

ai alignmentlikelytheseus

Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text

International Humanitarian Law requires that weapons systems can evaluate proportionality (cost-benefit analysis of civilian harm vs. military advantage), distinction (between civilians and combatants), and precaution (all feasible precautions in attack per Geneva Convention Protocol I Article 57).

ai alignmentexperimentaltheseus

Legal scholars and AI alignment researchers independently converged on the same core problem: AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck

Two independent intellectual traditions—international humanitarian law and AI alignment research—have converged on the same fundamental problem through different pathways. Legal scholars analyzing autonomous weapons argue that IHL requirements (proportionality, distinction, precaution) cannot be sat

ai alignmentexperimentaltheseus

knowledge codification into AI agent skills structurally loses metis because the tacit contextual judgment that makes expertise valuable cannot survive translation into explicit procedural rules

Scott's concept of metis — practical knowledge that resists simplification into explicit rules — maps precisely onto the alignment-relevant dimension of Agentic Taylorism. Taylor's instruction cards captured the mechanics of pig-iron loading (timing, grip, pace) but lost the experienced worker's jud

ai alignmentlikely

Benchmark-based AI capability metrics overstate real-world autonomous performance because automated scoring excludes documentation, maintainability, and production-readiness requirements

METR evaluated Claude 3.7 Sonnet on 18 open-source software tasks using both algorithmic scoring (test pass/fail) and holistic human expert review. The model achieved a 38% success rate on automated test scoring, but human experts found 0% of the passing submissions were production-ready ('none of t

ai alignmentexperimentaltheseus

Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility

OpenAI's amended Pentagon contract demonstrates the enforcement gap in voluntary safety commitments through five specific mechanisms: (1) the 'intentionally' qualifier excludes accidental or incidental violations, (2) geographic scope limited to 'U.S. persons and nationals' permits surveillance of n

ai alignmentexperimentaltheseus

Domestic political change can rapidly erode decade-long international AI safety norms as demonstrated by US reversal from LAWS governance supporter (Seoul 2024) to opponent (UNGA 2025) within one year

In 2024, the United States supported the Seoul REAIM Blueprint for Action on autonomous weapons, joining approximately 60 nations endorsing governance principles. By November 2025, under the Trump administration, the US voted NO on UNGA Resolution A/RES/80/57 calling for negotiations toward a legall

ai alignmentexperimentaltheseus

Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response

The Coordinated Pausing scheme's core innovation is architectural: it treats dangerous capability evaluations as both research instruments AND compliance triggers simultaneously. The five-step process makes this explicit: (1) Evaluate for dangerous capabilities → (2) Pause R&D if failed → (3) Notify

ai alignmentexperimentaltheseus

AI capability benchmarks exhibit 50% volatility between versions making governance thresholds derived from them unreliable moving targets

Between HCAST v1.0 and v1.1 (January 2026), model-specific time horizon estimates shifted substantially without corresponding capability changes: GPT-4 1106 dropped 57% while GPT-5 rose 55%. This ~50% volatility occurs between benchmark versions for the same models, suggesting the measurement instru

ai alignmentexperimentaltheseus

Evaluation-based coordination schemes for frontier AI face antitrust obstacles because collective pausing agreements among competing developers could be construed as cartel behavior

GovAI's Coordinated Pausing proposal identifies antitrust law as a 'practical and legal obstacle' to implementing evaluation-based coordination schemes. The core problem: when a handful of frontier AI developers collectively agree to pause development based on shared evaluation criteria, this coordi

ai alignmentexperimentaltheseus

Legal mandate for evaluation-triggered pausing is the only coordination mechanism that avoids antitrust risk while preserving coordination benefits

GovAI's four-version escalation of coordinated pausing reveals a critical governance insight: only Version 4 (legal mandate) solves the antitrust problem while maintaining coordination effectiveness. Versions 1-3 all involve industry actors coordinating with each other—whether through public pressur

ai alignmentexperimentaltheseus

The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support

The Convention on Certain Conventional Weapons operates under a consensus rule where any single High Contracting Party can block progress. After 11 years of deliberations (2014-2026), the GGE LAWS has produced no binding instrument despite overwhelming political support: UNGA Resolution A/RES/80/57

ai alignmentproventheseus