The paper introduces LongRoPE, a method that extends the context window of pre-trained large language models to an impressive 2048k tokens while maintaining performance at the original short context window. This is achieved through innovative strategies such as identifying and exploiting non-uniformities in positional interpolation, a progressive extension strategy, and readjusting LongRoPE on 8k length to recover the short context window performance. Extensive experiments show the effectiveness of LongRoPE, with models extended via this method retaining the original architecture with minor modifications to positional embedding. This groundbreaking approach opens up new possibilities for large language models.
https://arxiv.org/abs/2402.13753