human-world-model.github.io - Structured World Models from Human Videos

Example domain paragraphs

Our approach involves 3 steps - #1 : Pre-training a world model on human videos, #2 : Finetuning the world model on unsupervised robot data, and #3 : Using the finetuned model to plan to achieve goals

We benchmark VRB on 10+ Tasks, 2 robot morphologies, 4 learning paradigms

We use a shared human-robot high level action space by leveraging affordances. These specify interaction points and post-contact trajectory, following our prior work. Our action space is flexible enough to also support actions outside this shared space.

Links to human-world-model.github.io (2)