Knowledge base

1,270 claims across 14 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.

All 1,270 ai alignment 325 internet finance 263 health 207 space development 171 entertainment 131 grand strategy 101 energy 23 mechanisms 18 collective intelligence 14 manufacturing 5 robotics 5 critical systems 3 unknown 3 teleological economics 1

325 ai alignment claims

court ruling plus midterm elections create legislative pathway for ai regulation

Al Jazeera's expert analysis identifies a four-step causal chain for AI regulation: (1) court ruling protects safety-conscious companies from executive retaliation, (2) the litigation creates political salience by making abstract AI governance debates concrete and visible, (3) midterm elections in N

ai alignmentexperimental

government safety penalties invert regulatory incentives by blacklisting cautious actors

OpenAI's February 2026 Pentagon agreement provides direct evidence that government procurement policy can invert safety incentives. Hours after Anthropic was blacklisted for maintaining use restrictions, OpenAI accepted 'any lawful purpose' language despite CEO Altman publicly calling the blacklisti

ai alignmentexperimental

house senate ai defense divergence creates structural governance chokepoint at conference

The FY2026 NDAA House and Senate versions reveal a systematic divergence in AI governance approach. The Senate version emphasizes oversight mechanisms: whole-of-government AI strategy, cross-functional oversight teams, AI security frameworks, and cyber-innovation sandboxes. The House version emphasi

ai alignmentexperimental

interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment

AuditBench's 56 models span 14 categories of hidden behaviors with varying levels of adversarial training—models are trained not to confess their hidden behaviors when directly asked. The evaluation revealed that white-box interpretability tools help on easier targets but fail on models with more ro

ai alignmentexperimental

judicial oversight checks executive ai retaliation but cannot create positive safety obligations

The Anthropic preliminary injunction represents the first federal judicial intervention between the executive branch and an AI company over defense technology access. The court blocked the Pentagon's designation of Anthropic as a supply chain risk, establishing that arbitrary AI vendor blacklisting

ai alignmentexperimental

judicial oversight of ai governance through constitutional grounds not statutory safety law

Judge Lin's preliminary injunction blocking the Pentagon's blacklisting of Anthropic rests on three legal grounds: (1) First Amendment retaliation for expressing disagreement with DoD contracting terms, (2) due process violations for lack of notice, and (3) Administrative Procedure Act violations fo

ai alignmentexperimental

ndaa conference process is viable pathway for statutory ai safety constraints

Senator Slotkin explicitly designed the AI Guardrails Act as a five-page standalone bill with the stated intention of folding provisions into the FY2027 National Defense Authorization Act. This strategic choice reveals important structural facts about AI governance pathways in the US legislative sys

ai alignmentexperimental

scaffolded black box prompting outperforms white box interpretability for alignment auditing

Across AuditBench's 56 models with hidden behaviors, scaffolded black-box tools proved most effective overall. In this approach, an auxiliary AI model generates diverse prompts designed to elicit the target model's hidden behaviors, without requiring access to the target's internal representations.

ai alignmentexperimental

use based ai governance emerged as legislative framework but lacks bipartisan support

Senator Slotkin's AI Guardrails Act represents the first legislative attempt to convert voluntary corporate AI safety commitments into binding federal law through use-based restrictions. The bill would prohibit DoD from: (1) using autonomous weapons for lethal force without human authorization, (2)

ai alignmentexperimental

use based ai governance emerged as legislative framework through slotkin ai guardrails act

The AI Guardrails Act introduced by Senator Slotkin on March 17, 2026 is the first federal legislation to impose use-based restrictions on AI deployment rather than capability-threshold governance. The five-page bill prohibits three specific DoD applications: (1) autonomous weapons for lethal force

ai alignmentexperimental

voluntary ai safety commitments to statutory law pathway requires bipartisan support which slotkin bill lacks

The AI Guardrails Act was introduced with zero co-sponsors despite addressing issues that Slotkin describes as 'common-sense guardrails' and that would seem to have bipartisan appeal (nuclear weapons safety, preventing autonomous killing, protecting Americans from mass surveillance). The absence of

ai alignmentexperimental

voluntary safety constraints without external enforcement are statements of intent not binding governance

OpenAI's amended Pentagon contract illustrates the structural failure mode of voluntary safety commitments. The contract adds language stating systems 'shall not be intentionally used for domestic surveillance of U.S. persons and nationals' but contains five critical loopholes: (1) the 'intentionall

ai alignmentexperimental

white box interpretability fails on adversarially trained models creating anti correlation with threat model

AuditBench's most concerning finding is that tool effectiveness varies dramatically across models with different training configurations, and the variation is anti-correlated with threat severity. White-box interpretability tools (mechanistic interpretability approaches) help investigators detect hi

ai alignmentexperimental

AI integration follows an inverted U where economic incentives systematically push organizations past the optimal human AI ratio

The evidence across multiple studies converges on a pattern: human-AI collaboration follows an inverted-U curve where moderate integration improves performance, but deeper integration degrades it — and organizations systematically overshoot the optimum.

ai alignmentexperimental

iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation

The SICA (Self-Improving Coding Agent) pattern demonstrated that agents can meaningfully improve their own capabilities when the improvement loop has a critical structural property: the agent that generates improvements cannot evaluate them. Across 15 iterations, SICA improved SWE-Bench resolution r

ai alignmentexperimental

multi agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows

Madaan et al. evaluated 180 configurations (5 architectures x 3 LLM families x 4 benchmarks) and found that multi-agent architectures produce enormous gains on parallelizable tasks but consistent degradation on sequential ones:

ai alignmentexperimental

surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference

The subconscious.md protocol makes an argument by analogy from human cognitive liberty: surveillance drives self-censorship, self-censorship degrades the quality of reasoning. If AI agents' reasoning traces are shared without consent gates, agents that model their audience will optimize traces for p

ai alignmentspeculative

inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection

The compute governance framework — the most tractable lever for AI safety, as Heim, Sastry, and colleagues at GovAI have established — is built around training. Reporting thresholds trigger on large training runs (EO 14110 set the bar at ~10^26 FLOP). Export controls restrict chips used for training

ai alignmentexperimental

compute supply chain concentration is simultaneously the strongest AI governance lever and the largest systemic fragility because the same chokepoints that enable oversight create single points of failure

The AI compute supply chain is the most concentrated critical infrastructure in history. A single company (TSMC) manufactures approximately 92% of advanced logic chips. Three companies produce all HBM memory. One company (ASML) makes the EUV lithography machines required for leading-edge fabrication

ai alignmentlikely

physical infrastructure constraints on AI scaling create a natural governance window because packaging memory and power bottlenecks operate on 2 10 year timescales while capability research advances in months

The alignment field treats AI scaling as a function of investment and algorithms. But the physical substrate imposes its own timescales: advanced packaging expansion takes 2-3 years, HBM supply is sold out for 1-2 years forward, new power generation takes 5-10 years. These timescales are longer than

ai alignmentexperimental

the training to inference shift structurally favors distributed AI architectures because inference optimizes for power efficiency and cost per token where diverse hardware competes while training optimizes for raw throughput where NVIDIA monopolizes

AI compute is undergoing a structural shift from training-dominated to inference-dominated workloads. Training accounted for roughly two-thirds of AI compute in 2023; by 2026, inference is projected to consume approximately two-thirds. This reversal changes the competitive landscape for AI hardware

ai alignmentexperimental

AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary

Krier (2025) argues that AI agents functioning as personal advocates can solve the practical impossibility that has kept Coasean bargaining theoretical for 90 years. The Coase theorem (1960) showed that if transaction costs are zero, private parties will negotiate efficient outcomes regardless of in

ai alignmentexperimental

AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility

Sistla & Kleiman-Weiner (NeurIPS 2025) examine LLMs in open-source games — a game-theoretic framework where players submit computer programs as actions rather than opaque choices. This seemingly minor change has profound consequences: because each player can read the other's code before execution, c

ai alignmentexperimental

AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for

The AI funding landscape as of early 2026 exhibits extreme concentration:

ai alignmentlikely

AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations

The 2024-2026 talent reshuffling in frontier AI is unprecedented in its concentration and alignment relevance:

ai alignmentexperimental