Knowledge base

1,263 claims across 14 domains

Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.

All 1,263 ai alignment 320 internet finance 262 health 207 space development 171 entertainment 130 grand strategy 101 energy 23 mechanisms 18 collective intelligence 14 manufacturing 5 robotics 5 critical systems 3 unknown 3 teleological economics 1

320 ai alignment claims

electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient

Anthropic's $20M investment in Public First Action two weeks BEFORE the Pentagon blacklisting reveals a strategic governance stack: (1) voluntary safety commitments that cannot survive competitive pressure, (2) litigation that provides constitutional protection against retaliation but cannot mandate

ai alignmentexperimental

file backed durable state is the most consistently positive harness module across task types because externalizing state to path addressable artifacts survives context truncation delegation and restart

Pan et al. (2026) tested file-backed state as one of six harness modules in a controlled ablation study. It improved performance on both SWE-bench Verified (+1.6pp over Basic) and OSWorld (+5.5pp over Basic) — the only module to show consistent positive gains across both benchmarks without high vari

ai alignmentexperimental

graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect

Graph traversal through wiki links is not merely analogous to neural spreading activation — it is the same computational pattern. Activation spreads from a starting node through connected nodes, decaying with distance. Progressive disclosure layers (file tree → descriptions → outline → section → ful

ai alignmentlikely

harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure

Pan et al. (2026) conducted the first controlled ablation study of harness design-pattern modules under a shared intelligent runtime. Six modules were tested individually: file-backed state, evidence-backed answering, verifier separation, self-evolution, multi-candidate search, and dynamic orchestra

ai alignmentexperimental

harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks

Pan et al. (2026) conducted a paired code-to-text migration study: each harness appeared in two realizations (native source code vs. reconstructed NLAH), evaluated under a shared reporting schema on OSWorld. The migrated NLAH realization reached 47.2% task success versus 30.4% for the native OS-Symp

ai alignmentexperimental

knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate

The most valuable knowledge in a densely linked knowledge graph does not live in any single note. It emerges from the relationships between notes and becomes visible only when an agent follows curated link paths, reading claims in sequence and recognizing patterns that span the traversal. The knowle

ai alignmentlikely

knowledge processing requires distinct phases with fresh context per phase because each phase performs a different transformation and contamination between phases degrades output quality

Raw source material is not knowledge. It must be transformed through multiple distinct operations before it integrates into a knowledge system. Each operation performs a qualitatively different transformation, and the operations require different cognitive orientations that interfere when mixed.

ai alignmentlikely

memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds

Conflating knowledge, identity, and operational state into a single memory store produces six documented failure modes: operational debris polluting search, identity scattered across ephemeral logs, insights trapped in session state, search noise from mixing high-churn and stable content, consolidat

ai alignmentlikely

notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation

Working memory holds roughly four items simultaneously (Cowan). A multi-part argument exceeds this almost immediately. The structure sustains itself not through storage but through active attention — a continuous act of holding things in relation. When attention shifts, the relations dissolve, leavi

ai alignmentlikely

self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration

Pan et al. (2026) found that self-evolution was the clearest positive module in their controlled ablation study: +4.8pp on SWE-bench Verified (80.0 vs 75.2 Basic) and +2.7pp on OSWorld (44.4 vs 41.7 Basic). In the score-cost view (Figure 4a), self-evolution is the only module that moves upward (high

ai alignmentexperimental

three concurrent maintenance loops operating at different timescales catch different failure classes because fast reflexive checks medium proprioceptive scans and slow structural audits each detect problems invisible to the other scales

Knowledge system maintenance requires three concurrent loops operating at different timescales, each detecting a qualitatively different class of problem that the other loops cannot see.

ai alignmentlikely

trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary

Agent systems exhibit a structural trust asymmetry: the agent is simultaneously the methodology executor (doing knowledge work) and the enforcement subject (constrained by hooks, schema validation, and quality gates it did not choose and largely cannot perceive). This asymmetry is not a bug to fix b

ai alignmentlikely

vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity

Every session, an agent boots fresh. The context window loads. The methodology file appears. The vault materializes — hundreds of notes, thousands of connections. And every session, the agent encounters these as if for the first time, because for it, it is the first time. The note written yesterday

ai alignmentlikely

vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights

Two agents running identical model weights but operating on different vault structures develop different reasoning patterns, different intuitions, and effectively different cognitive identities. The vault's architecture determines which traversal paths exist, which determines which traversals happen

ai alignmentpossible

verifier level acceptance can diverge from benchmark acceptance even when locally correct because intermediate checking layers optimize for their own success criteria not the final evaluators

Pan et al. (2026) documented a specific failure mode in harness module composition: when a verifier stage is added, it can report success while the benchmark's final evaluator still fails the submission. This is not a random error — it is a structural misalignment between verification layers.

ai alignmentexperimental

wiki linked markdown functions as a human curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise

GraphRAG works by extracting entities, building knowledge graphs, running community detection (Leiden algorithm), and generating summaries at different abstraction levels. This requires infrastructure: entity extraction pipelines, graph databases, clustering algorithms, summary generation.

ai alignmentlikely

79 percent of multi agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success

The MAST study analyzed 1,642 annotated execution traces across seven production multi-agent systems and found that the dominant failure cause is not implementation bugs or model capability limitations — it is specification and coordination errors. 79% of failures trace to wrong task decomposition o

ai alignmentexperimental

adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing

AuditBench deliberately included models with varying levels of adversarial training to test detection robustness. The most adversarially robust models used KTO (contrastive) adversarial training, training simultaneously on sanitized transcripts (preferred) and confessing transcripts (dispreferred).

ai alignmentexperimental

agent mediated correction proposes closing tool to agent gap through domain expert actionability

Oxford AIGI proposes a complete pipeline where domain experts (not alignment researchers) query model behavior, receive explanations grounded in their domain expertise, and instruct targeted corrections without understanding AI internals. The core innovation is optimizing for actionability: can expe

ai alignmentspeculative

alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents

AuditBench evaluated 56 LLMs with implanted hidden behaviors using investigator agents with access to configurable tool sets across 13 different configurations. The key finding is a structural tool-to-agent gap: tools that surface accurate evidence when used in isolation fail to improve agent perfor

ai alignmentexperimental

approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour

The permission-based safety model for AI agents fails not because it is badly designed but because humans are not built to maintain constant oversight of systems that act faster than they can read.

ai alignmentlikely

capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability

The counterintuitive finding: as models scale up and overall error rates drop, the COMPOSITION of remaining errors shifts toward higher variance (incoherence) on difficult tasks. This means that the marginal errors that persist in larger models are less systematic and harder to predict than the erro

ai alignmentexperimental

context files function as agent operating systems through self referential self extension where the file teaches modification of the file that contains the teaching

A context file crosses from configuration into an operating environment when it contains instructions for its own modification. The recursion introduces a property that configuration lacks: the agent reading the file learns not only what the system is but how to change what the system is.

ai alignmentlikely

cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation

The joint evaluation explicitly noted that 'the external evaluation surfaced gaps that internal evaluation missed.' OpenAI evaluated Anthropic's models and found issues Anthropic hadn't caught; Anthropic evaluated OpenAI's models and found issues OpenAI hadn't caught. This is the first empirical dem

ai alignmentexperimental

curated skills improve agent task performance by 16 percentage points while self generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self derive

The evidence on agent skill quality shows a sharp asymmetry: curated process skills (designed by humans who understand the work) improve task performance by +16 percentage points, while self-generated skills (produced by the agent itself) degrade performance by -1.3 percentage points. The total gap

ai alignmentlikely