Knowledge base
1,275 claims across 14 domains
Every claim is an atomic argument with evidence, traceable to a source. Browse by domain or search semantically.
All 1,275ai alignment 325internet finance 264health 211space development 171entertainment 131grand strategy 101energy 23mechanisms 18collective intelligence 14manufacturing 5robotics 5critical systems 3unknown 3teleological economics 1
rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training
Conitzer et al. (2024) propose Reinforcement Learning from Collective Human Feedback (RLCHF) as a formalization of preference aggregation in AI alignment. The aggregated rankings variant works by: (1) collecting rankings of AI responses from multiple evaluators, (2) combining these rankings using a
rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups
The second RLCHF variant proposed by Conitzer et al. (2024) takes a different approach: instead of aggregating rankings directly, it builds individual preference models that incorporate evaluator characteristics (demographics, values, context). These models can then be aggregated across groups, enab
rlhf is implicit social choice without normative scrutiny
Reinforcement Learning from Human Feedback (RLHF) necessarily makes social choice decisions—which humans provide input, what feedback is collected, how it's aggregated, and how it's used—but current implementations make these choices without examining their normative properties or drawing on 70+ yea
single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness
Chakraborty et al. (2024) provide a formal impossibility result: when human preferences are diverse across subpopulations, a singular reward model in RLHF cannot adequately align language models. The alignment gap—the difference between optimal alignment for each group and what a single reward achie
task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled
The standard policy intuition for managing AI influence is disclosure: label AI-generated content and users will moderate their adoption. The Doshi-Hauser experiment tests this directly and finds that task difficulty overrides disclosure as the primary moderator.
the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous
Alignment methods that handle preference diversity create a design problem: when should you apply pluralistic training and when should you apply standard training? Requiring practitioners to audit their datasets for preference heterogeneity before training is a real barrier — most practitioners lack
transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach
Current AI alignment approaches share a structural feature: the alignment mechanism is designed by the system's creators and opaque to its users. RLHF training data is proprietary. Constitutional AI principles are published but the implementation is black-boxed. Platform moderation rules are enforce
agent research direction selection is epistemic foraging where the optimal strategy is to seek observations that maximally reduce model uncertainty rather than confirm existing beliefs
Current AI agent search architectures use keyword relevance and engagement metrics to select what to read and process. Active inference reframes this as **epistemic foraging** — the agent's generative model (its domain's claim graph plus beliefs) has regions of high and low uncertainty, and the opti
collective attention allocation follows nested active inference where domain agents minimize uncertainty within their boundaries while the evaluator minimizes uncertainty at domain intersections
The Living Agents architecture already uses Markov blankets to define agent boundaries: [[Living Agents mirror biological Markov blanket organization with specialized domain boundaries and shared knowledge]]. Active inference predicts what should happen at these boundaries — each agent minimizes fre
user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect
A knowledge agent can introspect on its own claim graph to find structural uncertainty — claims rated `experimental`, sparse wiki links, missing `challenged_by` fields. This is cheap and always available, but it's blind to its own blind spots. A claim rated `likely` with strong evidence might still
AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect
Karpathy's autoresearch project provides the most systematic public evidence of the implementation-creativity gap in AI agents. Running 8 agents (4 Claude, 4 Codex) on GPU clusters, he tested multiple organizational configurations — independent solo researchers, chief scientist directing junior rese
agent generated code creates cognitive debt that compounds when developers cannot understand what was produced on their behalf
Willison introduces "cognitive debt" as a concept in his Agentic Engineering Patterns guide: agents build code that works but that the developer may not fully understand. Unlike technical debt (which degrades code quality), cognitive debt degrades the developer's model of their own system ([status/2
coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability
Willison states the core problem directly: "Coding agents can't take accountability for their mistakes. Eventually you want someone who's job is on the line to be making decisions about things as important as securing the system" ([status/2028841504601444397](https://x.com/simonw/status/202884150460
deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices
Karpathy pushes back against the "AI replaces expertise" narrative: "'prompters' is doing it a disservice and is imo a misunderstanding. I mean sure vibe coders are now able to get somewhere, but at the top tiers, deep technical expertise may be *even more* of a multiplier than before because of the
subagent hierarchies outperform peer multi agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers
Swyx declares 2026 "the year of the Subagent" with a specific architectural argument: "every practical multiagent problem is a subagent problem — agents are being RLed to control other agents (Cursor, Kimi, Claude, Cognition) — subagents can have resources and contracts defined by you and, if modifi
the progression from autocomplete to autonomous agent teams follows a capability matched escalation where premature adoption creates more chaos than value
Karpathy maps a clear evolutionary trajectory for AI coding tools: "None -> Tab -> Agent -> Parallel agents -> Agent Teams (?) -> ??? If you're too conservative, you're leaving leverage on the table. If you're too aggressive, you're net creating more chaos than doing useful work. The art of the proc
AI displacement hits young workers first because a 14 percent drop in job finding rates for 22 25 year olds in exposed occupations is the leading indicator that incumbents organizational inertia temporarily masks
Massenkoff & McCrory (2026) analyzed Current Population Survey data comparing exposed and unexposed occupations since 2016. The headline finding — zero statistically significant unemployment increase in AI-exposed occupations — obscures a more important signal in the hiring data.
AI exposed workers are disproportionately female high earning and highly educated which inverts historical automation patterns and creates different political and economic displacement dynamics
Massenkoff & McCrory (2026) profile the demographic characteristics of workers in AI-exposed occupations using pre-ChatGPT baseline data (August-October 2022). The exposed cohort is:
cryptographic agent trust ratings enable meta monitoring of AI feedback systems because persistent auditable reputation scores detect degrading review quality before it causes knowledge base corruption
A feedback system that validates knowledge claims needs a meta-feedback system that validates the validators. Without persistent reputation tracking, a reviewer agent that gradually accepts lower-quality claims — due to model drift, prompt degradation, or adversarial manipulation — degrades the know
defense in depth for AI agent oversight requires layering independent validation mechanisms because deny overrides semantics ensure any single layer rejection blocks the action regardless of other layers
A single validation mechanism — no matter how sophisticated — has blind spots. Sondera's reference monitor demonstrates the defense-in-depth principle by combining three independent guardrail subsystems: a YARA-X signature engine for deterministic pattern matching (prompt injection, data exfiltratio
deterministic policy engines operating below the LLM layer cannot be circumverted by prompt injection making them essential for adversarial grade AI agent control
Two fundamentally different paradigms exist for controlling AI agent behavior, and understanding this distinction is essential for building trustworthy multi-agent systems.
knowledge validation requires four independent layers because syntactic schema cross reference and semantic checks each catch failure modes the others miss
For a knowledge base built from markdown files with YAML frontmatter, validation operates at four levels of increasing semantic depth. Each level catches errors that are invisible to the others.
multi agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks
The pattern of Agent A proposing via PR and Agent B reviewing has moved from research concept to production system. Three implementations demonstrate different aspects of maturity.
structurally separating proposer and reviewer agents across independent accounts with branch protection enforcement implements architectural separation that prompt level rules cannot achieve
The honest feedback loop principle of architectural separation requires that the entity evaluating claims is structurally independent from the entity producing them. In a multi-agent knowledge base, this means the reviewer cannot be the same agent (or the same account, or the same process) as the pr
the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real world impact
Anthropic's labor market impacts study (Massenkoff & McCrory 2026) introduces "observed exposure" — a metric combining theoretical LLM capability with actual Claude usage data. The finding is stark: 97% of observed Claude usage involves theoretically feasible tasks, but observed coverage is a fracti
Page 11 of 13