innermonologue.github.io - Inner Monologue: Embodied Reasoning through Planning with Language Models. Robotics at Google.

Example domain paragraphs

Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robotics. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and

Prior works have shown large language models (LLMs) demonstrate impressive planning capabilities for long-horizon embodied tasks, given arbitrary language instructions. However, it has remained one-directional - the LLM blindly influences the agent and the environment, but no feedback is routed back to the LLM. This issue is particularly prominent when an intermediate action fails during execution, because the LLM is not informed with any feedback. In this work, formulate an inner monologue by continually a

Given an unseen task instruction, we show that LLMs can not only generate sensible action plans as observed in previous works, but can also incorporate injected textual feedback of success detection and passive scene description. The video below shows one instantiation of using passive scene description as feedback ( Scene ). Specifically, the LLM first infers desired sub-tasks given the high-level instruction. Then, the scene description keeps track of the achieved sub-tasks after each step. Additionally,

Links to innermonologue.github.io (13)