The article discusses the importance of accurately forecasting the future behavior of road agents for safe planning in autonomous vehicles. The authors present a model called MotionLM, which represents continuous trajectories as sequences of discrete motion tokens and treats multi-agent motion prediction as a language modeling task. Unlike other models, MotionLM does not require anchors or explicit latent variable optimization, and it bypasses post-hoc interaction heuristics. Instead, it produces joint distributions over interactive agent futures in a single autoregressive decoding process. The proposed approach has achieved state-of-the-art performance on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.
https://arxiv.org/abs/2309.16534