Llama 32K Context Released by Together AI

Together AI is releasing LLaMA-2-7B-32K, a 32K context model that can be used for tasks such as multi-document understanding, summarization, and question answering. This model is built using Position Interpolation and Together AI’s data recipe and system optimizations. The open-source ecosystem for LLMs has been progressing rapidly, with models like RedPajama and MPT catching up with closed-source models. Together AI believes that extending the context length of open models to 32K-128K is the next opportunity. They provide examples of fine-tuning LLaMA-2-7B-32K for applications like book summarization and long-context question answering. They have also made system optimizations, including FlashAttention-2, to improve training and inference efficiency.

https://together.ai/blog/llama-2-7b-32k