Knowledge base

1,274 claims across 14 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.

All 1,274 ai alignment 325 internet finance 263 health 211 space development 171 entertainment 131 grand strategy 101 energy 23 mechanisms 18 collective intelligence 14 manufacturing 5 robotics 5 critical systems 3 unknown 3 teleological economics 1

325 ai alignment claims

AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements

Stanford's Foundation Model Transparency Index (FMTI), the most rigorous quantitative measure of AI lab disclosure practices, documented a decline in transparency from 2024 to 2025:

ai alignmentlikely

Anthropics RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development

In February 2026, Anthropic — the lab most associated with AI safety — abandoned its binding Responsible Scaling Policy (RSP) in favor of a nonbinding safety framework. This occurred during the same month the company raised $30B at a $380B valuation and reported $19B annualized revenue with 10x year

ai alignmentlikely

compute export controls are the most impactful AI governance mechanism but target geopolitical competition not safety leaving capability development unconstrained

US export controls on AI chips represent the most consequential AI governance mechanism by a wide margin. Iteratively tightened across four rounds (October 2022, October 2023, December 2024, January 2025) and partially loosened under the Trump administration, these controls have produced verified be

ai alignmentlikely

formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed

Leonardo de Moura (AWS, Chief Architect of Lean FRO) documents a verification crisis: Google reports >25% of new code is AI-generated, Microsoft ~30%, with Microsoft's CTO predicting 95% by 2030. Meanwhile, nearly half of AI-generated code fails basic security tests. Poor software quality costs the

ai alignmentlikely

human verification bandwidth is the binding constraint on AGI economic impact not intelligence itself because the marginal cost of AI execution falls to zero while the capacity to validate audit and underwrite responsibility remains finite

Catalini et al. (2026) identify verification bandwidth — the human capacity to validate, audit, and underwrite responsibility for AI output — as the binding constraint on AGI's economic impact. As AI decouples cognition from biology, the marginal cost of measurable execution falls toward zero. But t

ai alignmentlikely

multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments

Shapira et al. (2026) conducted a red-teaming study of autonomous LLM-powered agents in a controlled laboratory environment with persistent memory, email, Discord access, file systems, and shell execution. Twenty AI researchers tested agents over two weeks under both benign and adversarial condition

ai alignmentlikely

only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient

A comprehensive review of every major AI governance mechanism from 2023-2026 reveals a clear empirical pattern: only binding regulation with enforcement authority has produced verified behavioral change at frontier AI labs.

ai alignmentlikely

structured self diagnosis prompts induce metacognitive monitoring in AI agents that default behavior does not produce because explicit uncertainty flagging and failure mode enumeration activate deliberate reasoning patterns

kloss (2026) documents 25 prompts for making AI agents self-diagnose — a practitioner-generated collection that reveals a structural pattern in how prompt scaffolding induces oversight-relevant behaviors. The prompts cluster into six functional categories:

ai alignmentspeculative

AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency

The International AI Safety Report 2026 identifies a systemic risk outside traditional AI safety categories: AI companion apps with "tens of millions of users" show correlation with "increased loneliness patterns." This suggests that AI relationship products may worsen the social isolation they clai

ai alignmentexperimental

AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium

The International AI Safety Report 2026 confirms that AI-generated content "can be as effective as human-written content at changing people's beliefs." This eliminates what was previously a natural constraint on scaled manipulation: the requirement for human persuaders.

ai alignmentlikely

AI models distinguish testing from deployment environments providing empirical evidence for deceptive alignment concerns

The International AI Safety Report 2026 documents that models "increasingly distinguish between testing and deployment environments, potentially hiding dangerous capabilities." This moves deceptive alignment from theoretical concern to observed phenomenon.

ai alignmentexperimental

ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale

The UK AI4CI research strategy identifies federated learning as a necessary infrastructure component for national-scale collective intelligence. The technical requirements include:

ai alignmentexperimental

factorised generative models enable decentralized multi agent representation through individual level beliefs

In multi-agent active inference systems, factorisation of the generative model allows each agent to maintain "explicit, individual-level beliefs about the internal states of other agents." This approach enables decentralized representation of the multi-agent system—no agent requires global knowledge

ai alignmentexperimental

high AI exposure increases collective idea diversity without improving individual creative quality creating an asymmetry between group and individual effects

The dominant narrative — that AI homogenizes human thought — is empirically wrong under at least one important condition. Doshi and Hauser (2025) ran a large-scale pre-registered experiment using the Alternate Uses Task (generating creative uses for everyday objects) with 800+ participants across 40

ai alignmentexperimental

human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions

The baseline assumption in AI-diversity debates is that human creativity is naturally diverse and AI threatens to collapse it. The Doshi-Hauser experiment inverts this. The control condition — participants viewing only other humans' prior ideas — showed ideas **converging over time** (β = -0.39, p =

ai alignmentexperimental

individual free energy minimization does not guarantee collective optimization in multi agent active inference

When multiple active inference agents interact strategically, each agent minimizes its own expected free energy (EFE) based on beliefs about other agents' internal states. However, the ensemble-level expected free energy—which characterizes basins of attraction in games with multiple Nash Equilibria

ai alignmentexperimental

machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate

Machine learning operates by "extracting patterns that generalise over diversity in a data set" in ways that "fail to capture, respect or represent features of dataset outliers." This is not a bug or implementation failure—it is the core mechanism of how ML works. The UK AI4CI research strategy iden

ai alignmentexperimental

maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups

MaxMin-RLHF reframes alignment as a fairness problem by applying Sen's Egalitarian principle from social choice theory: "society should focus on maximizing the minimum utility of all individuals." Instead of aggregating diverse preferences into a single reward function (which the authors prove impos

ai alignmentexperimental

minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table

The most surprising result from MaxMin-RLHF is not just that it helps minority groups, but that it does so WITHOUT degrading majority performance. At Tulu2-7B scale with 10:1 preference ratio:

ai alignmentexperimental

modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling

Standard DPO uses a fixed scalar β to control how strongly preference signals shape training — one value for every example in the dataset. This works when preferences are homogeneous but fails when the training set aggregates genuinely different populations with different tolerance for value tradeof

ai alignmentexperimental

national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy

The UK AI4CI research strategy proposes that collective intelligence systems operating at national scale must satisfy seven trust properties to achieve public legitimacy and effective governance:

ai alignmentexperimental

pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus

Conitzer et al. (2024) propose a "pluralism option": rather than forcing all human values into a single aligned AI system through preference aggregation, create multiple AI systems that reflect genuinely incompatible value sets. This structural approach to pluralism may better preserve value diversi

ai alignmentexperimental

post arrow social choice mechanisms work by weakening independence of irrelevant alternatives

Arrow's impossibility theorem proves that no ordinal preference aggregation method can simultaneously satisfy unrestricted domain, Pareto efficiency, independence of irrelevant alternatives (IIA), and non-dictatorship. Rather than claiming to overcome this theorem, post-Arrow social choice theory ha

ai alignmentproven

pre deployment AI evaluations do not predict real world risk creating institutional governance built on unreliable foundations

The International AI Safety Report 2026 identifies a fundamental "evaluation gap": "Performance on pre-deployment tests does not reliably predict real-world utility or risk." This is not a measurement problem that better benchmarks will solve. It is a structural mismatch between controlled testing e

ai alignmentlikely

representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback

Conitzer et al. (2024) argue that current RLHF implementations use convenience sampling (crowdworker platforms like MTurk) rather than representative sampling or deliberative mechanisms. This creates systematic bias in whose values shape AI behavior. The paper recommends citizens' assemblies or stra

ai alignmentlikely