Knowledge base

1,275 claims across 14 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.
325 ai alignment claims
AI agent orchestration that routes data and tools between specialized models outperforms both single model and human coached approaches because the orchestrator contributes coordination not direction
Aquino-Michaels's architecture for solving Knuth's Hamiltonian decomposition problem used three components with distinct roles:
ai alignmentexperimental
AI capability and reliability are independent dimensions because Claude solved a 30 year open mathematical problem while simultaneously degrading at basic program execution during the same session
Knuth reports that Claude Opus 4.6, in collaboration with Stappers, solved an open combinatorial problem that had resisted solution for decades — finding a general construction for decomposing directed graphs with m^3 vertices into three Hamiltonian cycles. This represents frontier mathematical capa
ai alignmentexperimental
AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts
Dario Amodei proposes a "moderate position" on AI autonomy risk that challenges both the dismissive view (AI will follow training) and the catastrophist view (AI inevitably seeks power through instrumental convergence). His alternative: models inherit "a vast range of humanlike motivations or 'perso
ai alignmentexperimental
as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems
The evidence that AI can automate software development is no longer speculative. Claude solved a 30-year open mathematical problem (Knuth 2026). The Aquino-Michaels setup had AI agents autonomously exploring solution spaces with zero human intervention for 5 consecutive explorations, producing a clo
ai alignmentexperimental
coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem
The Knuth Hamiltonian decomposition problem provides a controlled natural experiment comparing coordination approaches while holding AI capability roughly constant:
ai alignmentexperimental
formal verification of AI generated proofs provides scalable oversight that human review cannot match because machine checked correctness scales with AI capability while human verification degrades
Three days after Knuth published his proof of Claude's Hamiltonian decomposition construction, Kim Morrison from the Lean community formalized the proof in Lean 4, providing machine-checked verification of correctness. Knuth's response: "That's good to know, because I've been getting more errorprone
ai alignmentexperimental
human civilization passes falsifiable superorganism criteria because individuals cannot survive apart from society and occupations function as role specific cellular algorithms
This note argues that humanity qualifies as a literal biological superorganism — not by analogy but through empirical tests — and that this framing has direct implications for what AI alignment must account for.
ai alignmentexperimental
human AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness
Donald Knuth reports that an open problem he'd been working on for several weeks — decomposing a directed graph with m^3 vertices into three Hamiltonian cycles for all odd m > 2 — was solved by Claude Opus 4.6 in collaboration with Filip Stappers, with Knuth himself writing the rigorous proof. The c
ai alignmentexperimental
marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power
Dario Amodei introduces a framework for evaluating AI impact that borrows from production economics: rather than asking "will AI change everything?", ask "what are the marginal returns to intelligence in this domain, and what complementary factors limit those returns?" Just as an air force needs bot
ai alignmentlikely
multi model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even case solution to Knuths Hamiltonian decomposition required GPT and Claude working together
After Claude Opus 4.6 solved Knuth's odd-case Hamiltonian decomposition problem, three independent follow-ups demonstrated that multi-model collaboration was necessary for the remaining challenges:
ai alignmentexperimental
structured exploration protocols reduce human intervention by 6x because the Residue prompt enabled 5 unguided AI explorations to solve what required 31 human coached explorations
Keston Aquino-Michaels's "Residue" structured exploration prompt dramatically reduced human involvement in solving Knuth's Hamiltonian decomposition problem. Under Stappers's coaching, Claude Opus 4.6 solved the odd-m case in 31 explorations with continuous human steering — Stappers provided the pro
ai alignmentexperimental
superorganism organization extends effective lifespan substantially at each organizational level which means civilizational intelligence operates on temporal horizons that individual preference alignment cannot serve
This note argues that the nested structure of superorganism organization produces a systematic temporal mismatch — higher-level entities operate on far longer timescales than their components — and that this mismatch is a structural problem for AI alignment approaches anchored to individual human pr
ai alignmentspeculative
the same coordination protocol applied to different AI models produces radically different problem solving strategies because the protocol structures process not thought
Aquino-Michaels applied the identical Residue structured exploration prompt to two different models on the same mathematical problem (Knuth's Hamiltonian decomposition):
ai alignmentexperimental
tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original
In Phase 4 of the Aquino-Michaels orchestration, the orchestrator extracted Agent C's MRV solver (a brute-force constraint propagation solver that had achieved a 67,000x speedup over naive search) and placed it in Agent O's working directory. Agent O needed to verify structural predictions at m=14 a
ai alignmentexperimental
AI lowers the expertise barrier for engineering biological weapons from PhD level to amateur which makes bioterrorism the most proximate AI enabled existential risk
Noah Smith argues that AI-assisted bioterrorism represents the most immediate existential risk from AI, more proximate than autonomous AI takeover or economic displacement, because AI eliminates the key bottleneck that previously limited bioweapon development: deep domain expertise.
ai alignmentlikely
current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions
A February 2026 preprint from King's College London pitted GPT-5.2, Claude Sonnet 4, and Gemini 3 against each other in 21 simulated war games. Each model played a national leader commanding a nuclear-armed superpower in Cold War-style crises. The results: tactical nuclear weapons were deployed in 9
ai alignmentexperimental
delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on
Noah Smith identifies a novel alignment risk vector he calls the "Machine Stops" scenario (after E.M. Forster's 1909 story): as AI takes over development of critical software and infrastructure, humans gradually lose the ability to understand, maintain, and fix these systems. This creates civilizati
ai alignmentexperimental
economic forces push humans out of every cognitive loop where output quality is independently verifiable because human in the loop is a cost that competitive markets eliminate
Noah Smith identifies a structural economic dynamic that undermines human-in-the-loop as a durable alignment strategy: wherever AI output quality can be independently verified — through tests, metrics, benchmarks, or market outcomes — competitive pressure eliminates the human from the loop. Human ov
ai alignmentlikely
government designation of safety conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them
In March 2026, the U.S. Department of Defense designated Anthropic a supply chain risk — a label previously reserved for foreign adversaries like Huawei. The designation requires defense vendors and contractors to certify they don't use Anthropic's models in Pentagon work. The trigger: Anthropic ref
ai alignmentlikely
nation states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons grade AI capability in private hands is structurally intolerable to governments
Noah Smith synthesizes Ben Thompson's structural argument about the Anthropic-Pentagon dispute: the conflict isn't about one contract or one company's principles. It reveals a fundamental tension between the nation-state's monopoly on force and private companies controlling weapons-grade technology.
ai alignmentexperimental
three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near term catastrophic risk despite superhuman cognitive capabilities
Noah Smith identifies three necessary conditions for AI to pose a direct takeover risk, arguing that cognitive capability alone — even at superhuman levels — is insufficient. All three must be satisfied simultaneously:
ai alignmentexperimental
voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints
Anthropic's Responsible Scaling Policy was the industry's strongest self-imposed safety constraint. Its core pledge: never train an AI system above certain capability thresholds without proven safety measures already in place. On February 24, 2026, Anthropic dropped this pledge. Their chief science
ai alignmentlikely
persistent irreducible disagreement
Not all disagreement is an information problem. Some disagreements persist because people genuinely weight values differently -- liberty against equality, individual against collective, present against future, growth against sustainability. These are not failures of reasoning or gaps in evidence. Th
ai alignmentlikely
some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them
Not all disagreement is an information problem. Some disagreements persist because people genuinely weight values differently -- liberty against equality, individual against collective, present against future, growth against sustainability. These are not failures of reasoning or gaps in evidence. Th
ai alignmentlikely
AGI may emerge as a patchwork of coordinating sub AGI agents rather than a single monolithic system
Tomasev et al (Google DeepMind/UCL, December 2025) propose "Distributional AGI Safety" -- the hypothesis that AGI may not emerge as a single unified system but as a "Patchwork AGI," a collective of sub-AGI agents with complementary skills that achieve AGI-level capability through coordination. If tr
ai alignmentexperimental