In this study, the focus is on multi-agent learning algorithms that have achieved superhuman planning in various games but haven’t been widely applied to real-world multi-agent planners. The main issue has been the need for billions of experience steps, which is addressed by GPUDrive, a GPU-accelerated simulator generating over a million steps per second. The simulator allows for complex agent behaviors written in C++ and running on CUDA for high performance. Surprisingly, reinforcement learning agents trained using GPUDrive on the Waymo Motion dataset show highly effective goal-reaching abilities in minutes for individual scenes and in a few hours overall. The trained agents are available in the code base for further exploration.
https://arxiv.org/abs/2408.01584