Q-Transformer is a scalable offline reinforcement learning method that can train multi-task policies using both human demonstrations and autonomously collected data. It uses a Transformer to represent Q-functions, allowing for effective sequence modeling techniques. By discretizing each action dimension and representing Q-values as separate tokens, Q-Transformer outperforms previous offline RL algorithms and imitation learning techniques in real-world robotic manipulation tasks. The approach involves transforming the original MDP into separate steps for each action dimension, ensuring tractability while solving the original problem. The method also incorporates a regularization technique and utilizes Monte-Carlo returns to accelerate learning. Q-Transformer can provide high-quality affordance values for downstream plan-and-execute frameworks.
https://q-transformer.github.io/