aialignmentforum.org - AI Alignment Forum

Description: A community blog devoted to technical AI alignment research

Example domain paragraphs

AI ALIGNMENT FORUM AF Login Home Library Questions All Posts About Home Library Questions All Posts Recommended Sequences AGI safety from first principles by Richard Ngo Embedded Agency by Abram Demski 2022 MIRI Alignment Discussion by Rob Bensinger AI Alignment Posts 50 Welcome & FAQ! Ruben Bloom , Oliver Habryka 2y 8 21 Barriers to Mechanistic Interpretability for AGI Safety Connor Leahy 9h 0 13 AI Deception: A Survey of Examples, Risks, and Potential Solutions Simon Goldstein , Peter S. Park 18h 0 44 Ope

If you’re interested in working on this agenda with us at Anthropic, we’re hiring! Please apply to the  research scientist or  research engineer position on the Anthropic website and mention that you’re interested in working on model organisms of misalignment.

We don’t currently have ~any strong empirical evidence for the most concerning sources of existential risk, most notably stories around dishonest AI systems that actively trick or fool their training processes or human operators: