The authors introduce Mistral 7B v0.1, a language model with 7 billion parameters that is designed for high performance and efficiency. Mistral 7B outperforms Llama 2 13B in all evaluated benchmarks and also surpasses Llama 1 34B in reasoning, mathematics, and code generation. The model incorporates grouped-query attention (GQA) for faster inference and sliding window attention (SWA) for handling sequences of any length with reduced inference cost. Additionally, the authors present Mistral 7B – Instruct, a model specifically fine-tuned to follow instructions, which outperforms Llama 2 13B – Chat in both human and automated benchmarks. The models are available under the Apache 2.0 license.
https://arxiv.org/abs/2310.06825