The illusion of state in state-space models

State-space models (SSMs) were thought to potentially outperform transformers in language models due to their similar architecture to recurrent neural networks. However, a study by Merrill & Sabharwal in 2023 disproved this notion. Contrary to expectations, SSMs do not have a significant advantage over transformers in state tracking capabilities. Experimental results indicate that SSMs struggle with tracking complex states, limiting their applicability to tasks like chess move tracking or code evaluation. Despite their recurrent nature, SSMs exhibit expressiveness limitations similar to non-recurrent models, suggesting they may not be suitable for real-world state-tracking problems.

https://arxiv.org/abs/2404.08819