Diffusion for World Modeling

The diffusion model works by predicting the next frame of a game based on the agent’s actions and previous frames. By reducing the number of denoising steps, the diffusion model can be faster, with EDM being more stable than DDPM at low denoising steps. More denoising steps improve prediction consistency, especially in games with multiple outcomes like Boxing. The diffusion-based DIAMOND model outperforms the token-based IRIS model in capturing important visual details. Interestingly, the white player’s movements are accurately predicted regardless of denoising steps due to controlled actions. DIAMOND achieves a record-breaking score on Atari 100k frames. More information can be found in the research paper.

https://diamond-wm.github.io/

To top