XGen-7B, a new 7B foundational model trained on up to 8K length for 1.5T tokens

TLDR: Salesforce has trained a series of 7 billion language models (LLMs) called XGen-7B with up to 8,000 tokens per sequence. These models achieve comparable or better results than other open-source LLMs on standard benchmarks, and the 8K-seq models outperform the 2K- and 4K-seq models in long sequence modeling. XGen-7B performs well in both text and code tasks, and the training cost is $150,000 on 1 trillion tokens. The models have been fine-tuned on public-domain instructional data. The training process involves stage-wise training and addresses challenges relating to loss spikes and sequence length. XGen-7B also shows promising results on various NLP tasks, including dialogue summarization and long-form QA.


To top