This article introduces a novel approach to modeling scene dynamics using an image-space prior. The authors have developed a method that learns from motion trajectories extracted from real video sequences with natural oscillating motion. They call this learned model a neural stochastic motion texture, which allows for the prediction of long-term per-pixel motion in the Fourier domain. This representation can then be converted into dense motion trajectories for various applications, such as creating dynamic videos from still images or enabling realistic interactions with objects in pictures. The use of a frequency-coordinated diffusion sampling process sets this approach apart and offers unique possibilities for visual applications.
https://generative-dynamics.github.io/