Description: A community blog devoted to technical AI alignment research
AI ALIGNMENT FORUM AF Login Home Library Questions All Posts About Home Library Questions All Posts Recommended Sequences AGI safety from first principles by Richard Ngo Embedded Agency by Abram Demski 2022 MIRI Alignment Discussion by Rob Bensinger AI Alignment Posts 50 Welcome & FAQ! Ruben Bloom , Oliver Habryka 2y 8 21 Barriers to Mechanistic Interpretability for AGI Safety Connor Leahy 9h 0 13 AI Deception: A Survey of Examples, Risks, and Potential Solutions Simon Goldstein , Peter S. Park 18h 0 44 Ope
If you’re interested in working on this agenda with us at Anthropic, we’re hiring! Please apply to the research scientist or research engineer position on the Anthropic website and mention that you’re interested in working on model organisms of misalignment.
We don’t currently have ~any strong empirical evidence for the most concerning sources of existential risk, most notably stories around dishonest AI systems that actively trick or fool their training processes or human operators: