Q-Transformer – Oh TL;DR

In this work, we introduce Q-Transformer, a scalable reinforcement learning method that can train multi-task policies using large offline datasets that include both human demonstrations and autonomously collected data. Q-Transformer utilizes a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, Q-Transformer can effectively apply high-capacity sequence modeling techniques for Q-learning. Our experiments demonstrate that Q-Transformer outperforms previous offline RL algorithms and imitation learning techniques on a diverse set of real-world robotic manipulation tasks. Additionally, when used as an affordance model in combination with a language planner, Q-Transformer shows even more promising results.

https://qtransformer.github.io/