Value-Based Deep RL Scales Predictably

In this paper, the authors explore the importance of scaling data and compute for the success of machine learning, emphasizing the need for predictability in performance. Contrary to common belief, they demonstrate that value-based off-policy RL methods can be predictable, challenging community lore. By analyzing the updates-to-data (UTD) ratio, they establish a Pareto frontier for resource allocation across data and compute, allowing for accurate prediction of performance levels with varying budgets. Additionally, they highlight the significance of managing hyperparameters to combat overfitting and plasticity loss unique to RL. The authors validate their approach using SAC, BRO, and PQL algorithms on various platforms, showcasing successful extrapolation to higher data and compute levels.

https://arxiv.org/abs/2502.04327

To top