Over the past decade, researchers have been focused on optimizing recurrent models and attention mechanisms. Recurrent models compress data into a fixed-size memory, while attention allows for a broader context view, capturing direct dependencies. A new neural long-term memory module has been introduced to memorize historical context, aiding attention in focusing on current context while utilizing past information. This module enables fast parallelizable training and efficient inference. This new architecture, called Titans, incorporates both short-term memory (attention) and long-term memory (neural memory), outperforming Transformers and linear recurrent models in various tasks. Titans can scale effectively to large context window sizes, excelling in needle-in-haystack tasks.
https://arxiv.org/abs/2501.00663