The Structured State Space for Sequence Modeling (S4) architecture is a new approach to long-range sequence modeling tasks. It has shown impressive results on the Long Range Arena benchmark, demonstrating an ability to reason over sequences of up to 16,000+ elements with high accuracy. The paper presents a different approach to the problem compared to Transformers. The author acknowledges that gaining intuition for the model can be difficult and aims to provide code implementations and explanations to help readers understand the details. The code provided allows for the efficient training of the S4 model, which can operate as a CNN for training and convert to an efficient RNN at test time. The project utilizes JAX with the Flax NN library for its functional nature, and several JAX functions such as vmap, scan, and jax.jit are utilized to optimize the S4 layers. The content also includes explanations and examples of state space models, discretization, and convolutional representations. Finally, the author introduces an SSM neural network layer that learns the parameters B and C, as well as a step size and a scalar D parameter. The layer can be used as both an RNN and a CNN.
https://srush.github.io/annotated-s4/