The Large Memory Model (LM2) is a decoder-only Transformer architecture with an auxiliary memory module designed to improve multi-step reasoning and synthesizing information across long contexts. By incorporating a memory module that interacts with input tokens and maintains the original information flow, LM2 outperforms other models in tasks such as numerical reasoning and question-answering. Surprisingly, it achieves a significant improvement over existing models on benchmarks like the BABILong dataset and the MMLU dataset. The study highlights the importance of explicit memory in enhancing Transformer architectures for better performance in various tasks.
https://arxiv.org/abs/2502.06049