Mistral: Mixtral of Experts

Mistral AI is committed to providing the developer community with top-notch open models in their quest to advance AI. They believe that progress in AI necessitates exploring new technological approaches and offering original models that can drive innovation. The team is proud to announce the release of Mixtral 8x7B, a high-quality sparse mixture of experts model with open weights. This model outperforms Llama 2 70B on most benchmarks and boasts 6x faster inference. It is considered the strongest open-weight model with a permissive license and offers excellent cost/performance trade-offs. Mixtral’s capabilities include efficient handling of large contexts, multilingual support, and strong performance in code generation. The model can also be fine-tuned for precise instruction following, achieving a score comparable to GPT3.5. Mixtral is a sparse mixture-of-experts network that selects from a set of distinct groups of parameters to process each token, resulting in greater parameter efficiency. It is pre-trained using data extracted from the open web. Mixtral has shown to be more truthful and less biased compared to Llama 2. It also supports multiple languages, including French, German, Spanish, Italian, and English. In addition,

https://mistral.ai/news/mixtral-of-experts/