awopt.github.io - AW-Opt

Example domain paragraphs

Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amounts of autonomously collected experience. Both methods have complementary strengths and weaknesses: RL can reach a high level of performance, but requires exploration, which can be very time consuming and unsafe; IL does not require exploration, but only learns skills that are as good as the provided demonstrations. Can a single method combine the strengths of both

We began our investigation with two existing methods: AWAC, which combines IL and RL, and QT-Opt, a scalable RL algorithm we have been using on our robots. Our testbed consists of 6 tasks, including a navigation task with dense reward and 5 manipulation tasks with sparse rewards. The manipulation tasks are on two different robot platforms using different control modalities (KUKA and our proprietary robot). Our tasks cover varying levels of difficulty from indiscriminate grasping (figure (a) and (c) below) t

Both algorithms are provided with demonstrations for offline pretraining, either from human or from previous successful RL rollouts. Afterwards they switch to on-policy data collection and training. We found that QT-Opt fails to learn from only successful rollouts, and even fails to make progress during on-policy training for tasks with a 7 DoF action space. On the other hand, AWAC does attain non-zero success rates from the demonstrations, but performance is still poor, and performance collapses during onl

Links to awopt.github.io (3)