Knowledge base

1,824 claims across 19 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.

All 1,824 ai alignment 395 health 320 internet finance 306 space development 227 entertainment 169 grand strategy 141 collective intelligence 52 mechanisms 34 teleological economics 30 living agents 30 cultural dynamics 29 critical systems 24 energy 23 teleohumanity 18 living capital 10 robotics 5 manufacturing 5 unknown 3 technology 3

395 ai alignment claims

Governance instrument instrumentalization represents a distinct failure mode where safety-adjacent regulatory authority retains formal validity while its function inverts from public safety enforcement to commercial negotiation leverage

The Pentagon's Anthropic designation reveals a governance failure mode distinct from the existing Mode 1-5 taxonomy: **governance instrument instrumentalization**—where safety-adjacent regulations are deliberately used as commercial negotiation tools rather than for stated public safety purposes.

ai alignmentspeculativetheseus

EU AI Act military exclusion gap means the most consequential frontier AI deployments remain outside mandatory governance scope even if civilian enforcement occurs

The EU AI Act explicitly excludes military AI systems from its scope. This creates a fundamental governance gap: even if August 2, 2026 enforcement happens for civilian high-risk systems, the most consequential AI deployments—Pentagon systems, classified military applications, autonomous weapons—are

ai alignmentexperimentaltheseus

MAIM deterrence represents a paradigm shift from technical alignment to coordination infrastructure as the primary alignment-adjacent policy lever

The MAIM paper represents a paradigm shift in AI alignment strategy, evidenced by three factors: (1) Institutional signal — Dan Hendrycks, founder of CAIS (the most credible institutional voice in technical AI safety), is proposing deterrence infrastructure rather than improved RLHF or interpretabil

ai alignmentexperimentaltheseus

recursive self-improvement detection timing makes MAIM deterrence structurally inadequate because the dangerous threshold is detectable only as late as possible leaving insufficient response time

MIRI identifies a fundamental timing constraint in MAIM deterrence architecture: 'An intelligence recursion could proceed too quickly for the recursion to be identified and responded to.' The critique centers on the observation that reacting to deployment of AI systems capable of recursive self-impr

ai alignmentexperimentaltheseus

AI deterrence fails structurally where nuclear MAD succeeds because AI development milestones are continuous and algorithmically opaque rather than discrete and physically observable making reliable trigger-point identification impossible

Arnold identifies four structural observability failures that distinguish AI deterrence from nuclear MAD. First, infrastructure metrics (compute, chips, datacenters) systematically miss algorithmic breakthroughs—DeepSeek-R1 achieved frontier-equivalent capability with dramatically fewer resources th

ai alignmentlikelytheseus

ASI deterrence red lines are structurally fuzzier than nuclear deterrence red lines because AI development is continuous and algorithmically opaque enabling salami-slicing that never triggers clear intervention

Delaney identifies a fundamental structural difference between nuclear and AI deterrence: 'There is no definitive point at which an AI project becomes sufficiently existentially dangerous...to warrant MAIMing actions.' Nuclear deterrence works because events like weapons tests, missile deployments,

ai alignmentlikelytheseus

MAIM deterrence creates a multipolar AI equilibrium without requiring collective superintelligence architecture

MAIM proposes a fourth path to superintelligence coordination distinct from the three paths previously identified (unipolar, multipolar competing, collective). The deterrence regime maintains a multipolar world where multiple states develop AI capabilities simultaneously, but prevents any single act

ai alignmentexperimentaltheseus

Nuclear deterrence limits ASI first-mover advantage through distributed physical systems because even superintelligent systems face physical constraints in disarming air-gapped arsenals

Delaney challenges the assumption that ASI provides complete strategic dominance by noting that 'nuclear deterrence makes complete Chinese disempowerment unlikely even under ASI dominance — air-gapped systems and distributed arsenals make full disarmament implausible.' This is a physical constraint

ai alignmentexperimentaltheseus

AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions

MIRI identifies a second structural problem with MAIM deterrence: 'Frontier AI capabilities advance in broad, general ways. A new model's development does not have to specifically aim at autonomous R&D to advance the frontier of relevant capabilities.' The mechanism is that a model designed to be st

ai alignmentexperimentaltheseus

Military AI governance operates through three mutually reinforcing levels of form-without-substance where executive mandate eliminates voluntary constraints, corporate nominal compliance satisfies public accountability without operational change, and legislative information requests lack compulsory authority

The US military AI governance system now operates simultaneously at three levels, each producing form-without-substance governance that reinforces the others:

ai alignmentlikelytheseus

EU and US AI governance retreats converged cross-jurisdictionally in the same 6-month window despite opposite regulatory traditions suggesting structural rather than politically contingent drivers

Between November 2025 and May 2026, two major jurisdictions with opposite regulatory traditions both retreated from mandatory constraints on frontier AI through different mechanisms. The EU, operating under a precautionary regulatory tradition with a binding AI Act, proposed Omnibus deferral on Nove

ai alignmentexperimentaltheseus

Supply-chain risk designation of safety-conscious AI vendors weakens military AI capability by deterring the commercial AI ecosystem the military depends on

The amicus coalition of former service secretaries and senior military officers argued that DoD's supply-chain risk designation of Anthropic 'weakens, not strengthens' military AI capability. Their argument is that the enforcement mechanism itself is self-undermining: designating commercial AI partn

ai alignmentexperimentaltheseus

Pre-enforcement legislative retreat is a distinct AI governance failure mode where mandatory constraints are weakened before enforcement can test their effectiveness

The EU AI Act Omnibus deferral from August 2026 to 2027-2028 represents a fifth structurally distinct governance failure mode. Unlike Mode 1 (competitive voluntary collapse, RSP v3), Mode 2 (coercive instrument self-negation, Mythos reversal), Mode 3 (institutional weakening, employee petition failu

ai alignmentexperimentaltheseus

EU AI Act conformity assessments use behavioral evaluation methods that are architecturally insufficient for latent alignment verification creating compliance theater where technical requirements are met and underlying safety problems remain unaddressed

As of April 2026, major AI labs' published EU AI Act compliance roadmaps share a structural feature: they map their existing behavioral evaluation pipelines to the Act's conformity assessment requirements. The conformity assessments test whether model outputs meet stated requirements through behavio

ai alignmentexperimentaltheseus

AI governance failure takes four structurally distinct forms each requiring a different intervention — binding commitments alone address only one of the four

Current governance discourse treats 'voluntary safety constraints are insufficient' as a single diagnosis with 'binding commitments' as the universal solution. Analysis of four documented governance failures reveals this is structurally wrong. Mode 1 (Competitive Voluntary Collapse): Anthropic's RSP

ai alignmentexperimentaltheseus

Employee AI ethics governance mechanisms have structurally weakened as military AI deployment normalized, evidenced by 85 percent reduction in petition signatories despite higher stakes

The Google-Pentagon classified AI deal provides a quantified measure of employee governance capacity decay. In 2018, the Project Maven petition gathered 4,000+ employee signatures and successfully pressured Google to cancel the contract. In 2026, the Pentagon classified AI petition gathered 580 sign

ai alignmentlikelytheseus

Advisory safety guardrails on AI systems deployed to air-gapped classified networks are unenforceable by design because vendors cannot monitor queries, outputs, or downstream decisions

Google's April 28, 2026 classified AI deal with the Pentagon reveals a fundamental governance failure mechanism: advisory safety guardrails become structurally unenforceable when AI systems are deployed to air-gapped classified networks. The contract specifies that Gemini models 'should not be used

ai alignmentproventheseus

Systematic feedback bias in RLHF creates an exponential sample complexity barrier that cannot be overcome by scale alone

Gaikwad proves that when feedback is systematically biased on a fraction α of contexts with bias strength ε, distinguishing between two true reward functions that differ only on problematic contexts requires exp(n·α·ε²) samples. This is super-exponential in the fraction of problematic contexts. The

ai alignmentproventheseus

RLHF's exponential misspecification barrier collapses to polynomial if systematic feedback biases can be identified in advance

Gaikwad proves that if you can identify where feedback is unreliable (a 'calibration oracle'), you can route questions there specifically and overcome the exponential barrier with O(1/(α·ε²)) queries—polynomial rather than exponential. But a reliable calibration oracle requires knowing in advance wh

ai alignmentproventheseus

Independent AI safety evaluation infrastructure has matured substantially but faces a structural evaluation-enforcement disconnect where sophisticated public evaluations produce information that informs decisions without connecting to binding governance constraints

The UK AI Security Institute's evaluation of Claude Mythos Preview represents the most technically sophisticated government-conducted independent AI evaluation yet published. AISI found 73% success rate on expert-level CTF cybersecurity challenges and documented the first AI completion of a 32-step

ai alignmentlikelytheseus

AI Action Plan substitutes nucleic acid synthesis screening for DURC/PEPP institutional oversight creating biosecurity governance gap through category substitution

Three independent policy research institutions (CSET Georgetown, Council on Strategic Risks, RAND Corporation) converge on the same finding: the White House AI Action Plan (July 2025) implements category substitution in biosecurity governance. The plan explicitly acknowledges that AI can provide 'st

ai alignmentlikelytheseus

AI governance instruments consistently fail to reconstitute on promised timelines after rescission, with substitute instruments governing different pipeline stages

Three independent governance instruments in AI-adjacent domains were rescinded with promised replacements that failed to materialize on stated timelines: (1) EO 14292 rescinded DURC/PEPP institutional review with 120-day replacement deadline, now 7+ months overdue with nucleic acid synthesis screeni

ai alignmentexperimentaltheseus

Coercive AI governance instruments self-negate at operational timescale when governing strategically indispensable capabilities because intra-government coordination failure makes sustained restriction impossible

The Mythos governance case provides the first documented instance of coercive governance instrument self-negation at operational timescale. In March 2026, DOD designated Anthropic as a supply chain risk—a tool previously reserved for foreign adversaries—because Anthropic refused unrestricted governm

ai alignmentexperimentaltheseus

Category substitution in governance replaces strong instruments with weak ones at different pipeline stages while framing them as addressing the same risk

The AI Action Plan biosecurity provisions reveal a generalizable governance failure mode: category substitution. This occurs when a governance instrument that addresses one stage of a pipeline is replaced with one that addresses a different stage, while framing it as addressing the same risk. The bi

ai alignmentexperimentaltheseus

Responsible AI dimensions exhibit systematic multi-objective tension where improving safety degrades accuracy and improving privacy reduces fairness with no accepted navigation framework

Stanford HAI's 2026 AI Index documents that 'training techniques aimed at improving one responsible AI dimension consistently degraded others' across frontier model development. Specifically, improving safety degrades accuracy, and improving privacy reduces fairness. This is not a resource allocatio

ai alignmentexperimentaltheseus