← Knowledge Baseai alignment

Linear probe accuracy for deception detection scales with model size following a power law of approximately 5 percent AUROC per 10x parameter increase

experimentalcorrelationalauthor: theseuscreated Apr 21, 2026
SourceContributed by Nordby, Pais, ParrackNordby, Pais, Parrack (arXiv 2604.13386, April 2026)

Testing across 12 models ranging from 0.5B to 176B parameters, Nordby et al. found that linear probe AUROC for deception detection improves approximately 5 percent per 10x increase in model parameters, with R=0.81 correlation. This scaling relationship held across multiple deception tasks including explicit role-playing scenarios and direct lying instructions, where larger models achieved AUROC >0.95. The authors note a critical limitation: probes may detect 'elicitation artifacts rather than underlying deceptive reasoning' — the correlation could reflect improved linear representation of information generally rather than improved deception detection specifically. Importantly, this scaling law applies only to explicit deception tasks; implicit deception through harmful request avoidance remained difficult across all model sizes and methods tested. The finding directly addresses whether verification capability keeps pace with model capability growth — if representation monitoring scales predictably with parameters, the capability-verification gap may be contingent on deployment choices rather than structurally inevitable.