qtransformer.github.io - Q-Transformer

Description: Q-Transformer

Example domain paragraphs

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply

We first describe how to enable using Transformers for Q-learning by applying discretization and autoregression of the action space. The classical way for learning a Q-function using TD-learning is based on the Bellman update rule:

We change the Bellman update to be performed for each action dimension by transforming the original MDP of the problem into an MDP where each action dimension is treated as a separate step for Q-learning. In particular, given the action dimensionality d A , the new Bellman update rule is:

Links to qtransformer.github.io (1)