HMT: Hierarchical Memory Transformer for Long Context Language Processing

Transformer-based large language models are commonly used in language processing tasks, but they often have restrictions on context window size. To address this limitation, a Hierarchical Memory Transformer (HMT) is proposed, which imitates human memorization behavior to improve long-context processing ability. By leveraging memory-augmented segment-level recurrence, HMT organizes memory hierarchy to recall relevant information from history. Evaluations on language modeling and question-answering tasks show HMT’s effectiveness in handling long-context effectively with a minimal increase in parameters. The open-sourced code is available on Github. This innovative approach could potentially enhance the performance of future language models.

https://arxiv.org/abs/2405.06067