Knowledge base
1,824 claims across 19 domains
Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.
All 1,824ai alignment 395health 320internet finance 306space development 227entertainment 169grand strategy 141collective intelligence 52mechanisms 34teleological economics 30living agents 30cultural dynamics 29critical systems 24energy 23teleohumanity 18living capital 10robotics 5manufacturing 5technology 3unknown 3
International AI governance form-substance divergence enables simultaneous treaty ratification and domestic implementation weakening
The EU simultaneously ratified the Council of Europe AI Framework Convention (March 11, 2026) while agreeing to delay EU AI Act high-risk system compliance timelines by up to 16 months through Omnibus VII (March 13, 2026). This represents form-substance divergence at the domestic level: the CoE trea
Soft-to-hard law transitions in AI governance succeed for procedural/rights-based domains but fail for capability-constraining governance because the transition requires interest alignment absent in strategic competition
Academic evidence shows soft-to-hard law transitions follow a domain-specific pattern. UNESCO declarations on genetics/bioethics successfully transitioned to influence policymaking in 219 member states because 'genetics research wasn't a strategic race' — no competitive dynamics between major powers
evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration
Nous Research's Guided Evolutionary Prompt Architecture (GEPA) implements a self-improvement mechanism structurally different from both SICA's acceptance-gating and NLAH's retry-based self-evolution. The key difference is the input: GEPA reads execution traces to understand WHY things failed, not ju
harness engineering outweighs model selection in agent system performance because changing the code wrapping the model produces up to 6x performance gaps on the same benchmark while model upgrades produce smaller gains
Stanford and MIT's Meta-Harness paper (March 2026) establishes that the harness — the code determining what to store, retrieve, and show to the model — often matters as much as or more than the model itself. A single harness change can produce "a 6x performance gap on the same benchmark."
evaluation and optimization have opposite model diversity optima because evaluation benefits from cross family diversity while optimization benefits from same family reasoning pattern alignment
Two independent findings appear contradictory but resolve into a task-dependent boundary condition.
verification is easier than generation for AI alignment at current capability levels but the asymmetry narrows as capability gaps grow creating a window of alignment opportunity that closes with scaling
Paul Christiano's entire alignment research program — debate, iterated amplification, recursive reward modeling — rests on one foundational asymmetry: it is easier to check work than to do it. This asymmetry is what makes delegation safe in principle. If a human can verify an AI system's outputs eve
prosaic alignment can make meaningful progress through empirical iteration within current ML paradigms because trial and error at pre critical capability levels generates useful signal about alignment failure modes
Paul Christiano's prosaic alignment thesis, first articulated in 2016, makes a specific claim: the most likely path to AGI runs through scaling current ML approaches (neural networks, reinforcement learning, transformer architectures), and alignment research should focus on techniques compatible wit
LLM maintained knowledge bases that compile rather than retrieve represent a paradigm shift from RAG to persistent synthesis because the wiki is a compounding artifact not a query cache
Karpathy's LLM Wiki methodology (April 2026) proposes a three-layer architecture that inverts the standard RAG pattern:
the shape of returns on cognitive reinvestment determines takeoff speed because constant or increasing returns on investing cognitive output into cognitive capability produce recursive self improvement
Yudkowsky's "Intelligence Explosion Microeconomics" (2013) provides the analytical framework for distinguishing between fast and slow AI takeoff. The key variable is not raw capability but the *return curve on cognitive reinvestment*: when an AI system invests its cognitive output into improving its
agent native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge
Mintlify's ChromaFS (April 2026) replaced their RAG pipeline with a virtual filesystem that intercepts UNIX commands and translates them into database queries against their existing Chroma vector database. The results:
capabilities generalize further than alignment as systems scale because behavioral heuristics that keep systems aligned at lower capability cease to function at higher capability
The "sharp left turn" thesis, originated by Yudkowsky and named by Soares, makes a specific prediction about the relationship between capability and alignment: they will diverge discontinuously. A system that appears aligned at capability level N may be catastrophically misaligned at capability leve
inverse reinforcement learning with objective uncertainty produces provably safe behavior because an AI system that knows it doesnt know the human reward function will defer to humans and accept shutdown rather than persist in potentially wrong actions
Stuart Russell's *Human Compatible* (2019) proposes three principles that invert the standard model of AI development:
the absence of a societal warning signal for AGI is a structural feature not an accident because capability scaling is gradual and ambiguous and collective action requires anticipation not reaction
Yudkowsky's "There's No Fire Alarm for Artificial General Intelligence" (2017) makes an epistemological claim about collective action, not a technical claim about AI: there will be no moment of obvious, undeniable clarity that forces society to respond to AGI risk. The fire alarm for a building fire
corrigibility is at cross purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests
Yudkowsky identifies an asymmetry at the heart of the alignment problem: deception and goal integrity are convergent instrumental strategies — a sufficiently intelligent agent develops them "for free" as natural consequences of goal-directed optimization. Corrigibility (the property of allowing your
sufficiently complex orchestrations of task specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level
The strongest objection to Drexler's CAIS framework and to collective AI architectures more broadly: even if no individual service or agent possesses general agency, a sufficiently complex composition of services may exhibit emergent unified agency. A system with planning services, memory services,
iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute
Paul Christiano's Iterated Distillation and Amplification (IDA) is the most specific proposal for maintaining alignment across capability scaling. The mechanism is precise:
cooperative inverse reinforcement learning formalizes alignment as a two player game where optimality in isolation is suboptimal because the robot must learn human preferences through observation not specification
Hadfield-Menell et al. (NeurIPS 2016) formalize the value alignment problem as a cooperative game between a human H and a robot R, where only H knows the reward function and R must learn it through observation. The key result: in a CIRL game, the robot's optimal policy is NOT to maximize its current
self optimizing agent harnesses outperform hand engineered ones because automated failure mining and iterative refinement explore more of the harness design space than human engineers can
Two independent systems released within days of each other (late March / early April 2026) demonstrate the same pattern: letting an AI agent modify its own harness — system prompt, tools, agent configuration, orchestration — produces better results than human engineering.
distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system
**This is a CHALLENGE claim to two core KB positions: that collective superintelligence is the alignment-compatible path, and that alignment is fundamentally a coordination problem.**
technological development draws from an urn containing civilization destroying capabilities and only preventive governance can avoid black ball technologies
Bostrom (2019) introduces the urn model of technological development. Humanity draws balls (inventions, discoveries) from an urn. Most are white (net beneficial) or gray (mixed — benefits and harms). The Vulnerable World Hypothesis (VWH) states that in this urn there is at least one black ball — a t
learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want
Russell (2019) identifies the "standard model" of AI as the root cause of alignment risk: build a system, give it a fixed objective, let it optimize. This model produces systems that resist shutdown (being turned off prevents goal achievement), pursue resource acquisition (more resources enable more
verification being easier than generation may not hold for superhuman AI outputs because the verifier must understand the solution space which requires near generator capability
Paul Christiano's alignment approach rests on a foundational asymmetry: it's easier to check work than to do it. This is true in many domains — verifying a mathematical proof is easier than discovering it, reviewing code is easier than writing it, checking a legal argument is easier than constructin
progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading
Agent systems face a scaling dilemma: more knowledge should improve performance, but loading more knowledge into context increases token cost linearly and degrades attention quality. Progressive disclosure resolves this by loading knowledge at multiple tiers of specificity, expanding to full detail
comprehensive AI services achieve superintelligent capability through architectural decomposition into task specific systems that collectively match general intelligence without any single system possessing unified agency
Drexler (2019) proposes a fundamental reframing of the alignment problem. The standard framing assumes AI development will produce a monolithic superintelligent agent with unified goals, then asks how to align that agent. Drexler argues this framing is a design choice, not an inevitability. The alte
the relationship between training reward signals and resulting AI desires is fundamentally unpredictable making behavioral alignment through training an unreliable method
In "If Anyone Builds It, Everyone Dies" (2025), Yudkowsky and Soares identify a premise they consider central to AI existential risk: the link between training reward and resulting AI desires is "chaotic and unpredictable." This is not a claim that training doesn't produce behavior change — it obvio
Page 32 of 73