Automatic Music Playlist Generation via Simulation-Based Reinforcement Learning

In this article, the authors discuss their application of reinforcement learning (RL) to solve the problem of automatic music playlist generation. They developed a RL framework called Action-Head DQN (AH-DQN) that optimizes playlist sequencing to maximize user satisfaction metrics. They emphasize that their use case differs from standard slate recommendation tasks as they take into account user-generated responses for multiple items in the playlist. To train their agent, they used a model-based RL approach where a user simulator estimates how a user would respond to recommended tracks. Offline and online evaluations showed that their agent outperformed baseline methods and correlated well with observed online metric results. The authors highlight the practicality and flexibility of RL in addressing various music recommendation problems.

To top