craftjarvis-jarvis1.github.io - JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models

Description: JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models

Example domain paragraphs

Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents. Existing approaches can handle certain long-horizon tasks in an open world. However, they still struggle when the number of open-world tasks could potentially be infinite and lack the capability to progressively enhance task completion as game time progresses. We introduce JARVIS-1 , an open-world agent that can perceive multimodal input (visual observations and h

JARVIS-1 is able to self-improve following a life-long learning paradigm thanks to growing multimodal memory, sparking a more general intelligence and improved autonomy. Next, we will demonstrate the performance of JARVIS-1 at different learning stages when completing the same task. (One Epoch represents that all tasks in the task-pool have been executed by JARVIS-1 in the environment once, regardless of success or failure.)

1) mine 3 logs 2) craft 12 planks 3) craft 1 crafting_table 4) craft 4 stick 5) craft 1 wooden_pickaxe 6) mine 3 cobblestone 7) craft 1 stone_pickaxe 8) mine 2 iron_ore 9) smelt 2 iron_ingot 10) craft 1 shears (Lack of furnace as tool)

Links to craftjarvis-jarvis1.github.io (4)