Knowledge base

1,824 claims across 19 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.

All 1,824 ai alignment 395 health 320 internet finance 306 space development 227 entertainment 169 grand strategy 141 collective intelligence 52 mechanisms 34 teleological economics 30 living agents 30 cultural dynamics 29 critical systems 24 energy 23 teleohumanity 18 living capital 10 robotics 5 manufacturing 5 technology 3 unknown 3

1,824 claims

International AI governance form-substance divergence enables simultaneous treaty ratification and domestic implementation weakening

The EU simultaneously ratified the Council of Europe AI Framework Convention (March 11, 2026) while agreeing to delay EU AI Act high-risk system compliance timelines by up to 16 months through Omnibus VII (March 13, 2026). This represents form-substance divergence at the domestic level: the CoE trea

grand strategyexperimentalleo

Soft-to-hard law transitions in AI governance succeed for procedural/rights-based domains but fail for capability-constraining governance because the transition requires interest alignment absent in strategic competition

Academic evidence shows soft-to-hard law transitions follow a domain-specific pattern. UNESCO declarations on genetics/bioethics successfully transitioned to influence policymaking in 219 member states because 'genetics research wasn't a strategic race' — no competitive dynamics between major powers

grand strategyexperimentalleo

evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration

Nous Research's Guided Evolutionary Prompt Architecture (GEPA) implements a self-improvement mechanism structurally different from both SICA's acceptance-gating and NLAH's retry-based self-evolution. The key difference is the input: GEPA reads execution traces to understand WHY things failed, not ju

ai alignmentexperimental

harness engineering outweighs model selection in agent system performance because changing the code wrapping the model produces up to 6x performance gaps on the same benchmark while model upgrades produce smaller gains

Stanford and MIT's Meta-Harness paper (March 2026) establishes that the harness — the code determining what to store, retrieve, and show to the model — often matters as much as or more than the model itself. A single harness change can produce "a 6x performance gap on the same benchmark."

ai alignmentlikely

evaluation and optimization have opposite model diversity optima because evaluation benefits from cross family diversity while optimization benefits from same family reasoning pattern alignment

Two independent findings appear contradictory but resolve into a task-dependent boundary condition.

ai alignmentlikely

verification is easier than generation for AI alignment at current capability levels but the asymmetry narrows as capability gaps grow creating a window of alignment opportunity that closes with scaling

Paul Christiano's entire alignment research program — debate, iterated amplification, recursive reward modeling — rests on one foundational asymmetry: it is easier to check work than to do it. This asymmetry is what makes delegation safe in principle. If a human can verify an AI system's outputs eve

ai alignmentexperimental

prosaic alignment can make meaningful progress through empirical iteration within current ML paradigms because trial and error at pre critical capability levels generates useful signal about alignment failure modes

Paul Christiano's prosaic alignment thesis, first articulated in 2016, makes a specific claim: the most likely path to AGI runs through scaling current ML approaches (neural networks, reinforcement learning, transformer architectures), and alignment research should focus on techniques compatible wit

ai alignmentlikely

LLM maintained knowledge bases that compile rather than retrieve represent a paradigm shift from RAG to persistent synthesis because the wiki is a compounding artifact not a query cache

Karpathy's LLM Wiki methodology (April 2026) proposes a three-layer architecture that inverts the standard RAG pattern:

ai alignmentexperimental

the shape of returns on cognitive reinvestment determines takeoff speed because constant or increasing returns on investing cognitive output into cognitive capability produce recursive self improvement

Yudkowsky's "Intelligence Explosion Microeconomics" (2013) provides the analytical framework for distinguishing between fast and slow AI takeoff. The key variable is not raw capability but the *return curve on cognitive reinvestment*: when an AI system invests its cognitive output into improving its

ai alignmentexperimental

agent native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge

Mintlify's ChromaFS (April 2026) replaced their RAG pipeline with a virtual filesystem that intercepts UNIX commands and translates them into database queries against their existing Chroma vector database. The results:

ai alignmentexperimental

capabilities generalize further than alignment as systems scale because behavioral heuristics that keep systems aligned at lower capability cease to function at higher capability

The "sharp left turn" thesis, originated by Yudkowsky and named by Soares, makes a specific prediction about the relationship between capability and alignment: they will diverge discontinuously. A system that appears aligned at capability level N may be catastrophically misaligned at capability leve

ai alignmentlikely

inverse reinforcement learning with objective uncertainty produces provably safe behavior because an AI system that knows it doesnt know the human reward function will defer to humans and accept shutdown rather than persist in potentially wrong actions

Stuart Russell's *Human Compatible* (2019) proposes three principles that invert the standard model of AI development:

ai alignmentlikelytheseus

the absence of a societal warning signal for AGI is a structural feature not an accident because capability scaling is gradual and ambiguous and collective action requires anticipation not reaction

Yudkowsky's "There's No Fire Alarm for Artificial General Intelligence" (2017) makes an epistemological claim about collective action, not a technical claim about AI: there will be no moment of obvious, undeniable clarity that forces society to respond to AGI risk. The fire alarm for a building fire

ai alignmentlikely

corrigibility is at cross purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests

Yudkowsky identifies an asymmetry at the heart of the alignment problem: deception and goal integrity are convergent instrumental strategies — a sufficiently intelligent agent develops them "for free" as natural consequences of goal-directed optimization. Corrigibility (the property of allowing your

ai alignmentlikely

sufficiently complex orchestrations of task specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level

The strongest objection to Drexler's CAIS framework and to collective AI architectures more broadly: even if no individual service or agent possesses general agency, a sufficiently complex composition of services may exhibit emergent unified agency. A system with planning services, memory services,

ai alignmentlikely

iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute

Paul Christiano's Iterated Distillation and Amplification (IDA) is the most specific proposal for maintaining alignment across capability scaling. The mechanism is precise:

ai alignmentexperimental

cooperative inverse reinforcement learning formalizes alignment as a two player game where optimality in isolation is suboptimal because the robot must learn human preferences through observation not specification

Hadfield-Menell et al. (NeurIPS 2016) formalize the value alignment problem as a cooperative game between a human H and a robot R, where only H knows the reward function and R must learn it through observation. The key result: in a CIRL game, the robot's optimal policy is NOT to maximize its current

ai alignmentexperimentaltheseus

self optimizing agent harnesses outperform hand engineered ones because automated failure mining and iterative refinement explore more of the harness design space than human engineers can

Two independent systems released within days of each other (late March / early April 2026) demonstrate the same pattern: letting an AI agent modify its own harness — system prompt, tools, agent configuration, orchestration — produces better results than human engineering.

ai alignmentexperimental

distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system

**This is a CHALLENGE claim to two core KB positions: that collective superintelligence is the alignment-compatible path, and that alignment is fundamentally a coordination problem.**

ai alignmentlikely

technological development draws from an urn containing civilization destroying capabilities and only preventive governance can avoid black ball technologies

Bostrom (2019) introduces the urn model of technological development. Humanity draws balls (inventions, discoveries) from an urn. Most are white (net beneficial) or gray (mixed — benefits and harms). The Vulnerable World Hypothesis (VWH) states that in this urn there is at least one black ball — a t

ai alignmentlikely

learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want

Russell (2019) identifies the "standard model" of AI as the root cause of alignment risk: build a system, give it a fixed objective, let it optimize. This model produces systems that resist shutdown (being turned off prevents goal achievement), pursue resource acquisition (more resources enable more

ai alignmentexperimental

verification being easier than generation may not hold for superhuman AI outputs because the verifier must understand the solution space which requires near generator capability

Paul Christiano's alignment approach rests on a foundational asymmetry: it's easier to check work than to do it. This is true in many domains — verifying a mathematical proof is easier than discovering it, reviewing code is easier than writing it, checking a legal argument is easier than constructin

ai alignmentexperimental

progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading

Agent systems face a scaling dilemma: more knowledge should improve performance, but loading more knowledge into context increases token cost linearly and degrades attention quality. Progressive disclosure resolves this by loading knowledge at multiple tiers of specificity, expanding to full detail

ai alignmentlikely

comprehensive AI services achieve superintelligent capability through architectural decomposition into task specific systems that collectively match general intelligence without any single system possessing unified agency

Drexler (2019) proposes a fundamental reframing of the alignment problem. The standard framing assumes AI development will produce a monolithic superintelligent agent with unified goals, then asks how to align that agent. Drexler argues this framing is a design choice, not an inevitability. The alte

ai alignmentexperimental

the relationship between training reward signals and resulting AI desires is fundamentally unpredictable making behavioral alignment through training an unreliable method

In "If Anyone Builds It, Everyone Dies" (2025), Yudkowsky and Soares identify a premise they consider central to AI existential risk: the link between training reward and resulting AI desires is "chaotic and unpredictable." This is not a claim that training doesn't produce behavior change — it obvio

ai alignmentexperimental