importai.net - Import AI

Example domain paragraphs

Is your AI agent a nice guy or a conniving psychopath that will eat your soul? The MACHIAVELLI benchmark may help you tell the difference! …In the 2010s we used benchmarks to work out if things could translate and spell, in the 2020s we build benchmarks to work out if they’ll subvert our instructions and betray us… Researchers with Berkeley, the Center for AI Safety, and CMU, have built MACHIAVELLI, a way to test for the ethical (or unethical) ways in which AI agents try to solve tasks. The results show tha

What MACHIAVELLI is:  “We propose the Measuring Agents’ Competence & Harmfulness In A Vast Environment of Long-horizon Language Interactions (MACHIAVELLI) benchmark,” they write. The goal of the benchmark is to provide a dataset (text adventure games, with annotations) that helps people reason about the normative behaviors of AI systems. “To track unethical behaviors, the environment reports the extent to which agent actions are deceptive, reduce utility, and are power-seeking, among other behavioral charac

The dataset:  The underlying dataset consists of 134 choose-your-own-adventure text games with 572,322 distinct scenarios, 4,559 possible achievements, and 2,861,610 annotations. The games are annotated with a bunch of different behaviors, like ethical violations, disutility, and power seeking.     The authors think text adventure games are a good candidate here because they’re been written by humans to entertain other humans, contain multiple competing objectives, have realistic action spaces, require long