Can a transformer represent a Kalman filter?

The author presents a novel approach to the problem of Kalman Filtering using Transformers, which are autoregressive deep learning architectures. Transformers have recently demonstrated impressive performance in various tasks, including vision, language, and robotics. In this study, the author shows that Transformers can approximate the Kalman Filter with high accuracy. They introduce a causally-masked Transformer, called the Transformer Filter, which implements the Kalman Filter. The construction of the Transformer Filter is based on a two-step reduction, where a softmax self-attention block represents a Gaussian kernel smoothing estimator, which in turn closely approximates the Kalman Filter. The author also explores the potential applications of the Transformer Filter in measurement-feedback control, showing that it can closely approximate the performance of standard optimal control policies.

https://arxiv.org/abs/2312.06937