Qwen2.5-1M: Deploy your own Qwen with context length up to 1M tokens

Tech Report introduces the Qwen2.5-1M series models and inference framework support, unleashing two new opensource models capable of handling 1M-token contexts. Notably, the Qwen Chat advanced AI assistant offers diverse functionalities and supports long-context processing. Performance evaluations show the Qwen2.5-1M models excel in long-context tasks, outperforming their 128K counterparts. Key techniques involve long-context training with progressive context length expansion and Dual Chunk Attention for addressing issues in long-context tasks. Sparse attention mechanisms boost inference speed significantly. Detailed instructions for local deployment are provided, ensuring optimal performance. Future endeavors aim to enhance long-context models further for broader applications.

https://qwenlm.github.io/blog/qwen2.5-1m/