Mistral AI is breaking new ground in the realm of artificial intelligence by unveiling Mixtral 8x7B, a high-quality sparse mixture-of-experts (SMoE) model with open weights. This release marks a significant step forward, surpassing benchmarks and outperforming established models like Llama 2 70B while achieving six times faster inference. What sets Mixtral apart is not only its superior performance but also its permissive Apache 2.0 license, making it the strongest open-weight model with unparalleled cost/performance trade-offs.
Key Features of Mixtral 8x7B:
Context Handling: Mixtral adeptly manages a context of 32k tokens, offering a robust foundation for diverse applications.
Multilingual Support: With capabilities spanning English, French, Italian, German, and Spanish, Mixtral ensures broad accessibility and usability.
Code Generation Prowess: Exhibiting robust performance in code generation tasks, Mixtral is poised to enhance various programming-related applications.
Fine-Tuning Excellence: Mixtral can be fine-tuned into an instruction-following model, achieving an impressive score of 8.3 on MT-Bench, showcasing its adaptability and versatility.
Advancing with Sparse Architectures:
Mixtral introduces a sparse mixture-of-experts network, operating as a decoder-only model. Its feedforward block selects from eight distinct groups of parameters, enhancing model capacity while managing costs and latency effectively. With a total of 46.7B parameters, Mixtral optimally utilizes 12.9B parameters per token, delivering rapid processing and output generation, matching the efficiency of a 12.9B model.
Instructed Models for Precision:
The release of Mixtral 8x7B Instruct, optimized through supervised fine-tuning and direct preference optimization (DPO), underscores Mistral AI’s commitment to precision in instruction following. Achieving a score of 8.30 on MT-Bench, this model stands as the best open-source choice, rivaling the performance of GPT3.5.
Performance
1. Mixtral matches or outperforms Llama 2 70B, as well as GPT3.5, on most benchmarks.
3. Hallucination and biases: Compared to Llama 2, Mixtral is more truthful (73.9% vs 50.2% on the TruthfulQA benchmark) and presents less bias on the BBQ benchmark.
Open-Source Deployment Stack:
Mistral AI facilitates the community’s use of Mixtral with a fully open-source stack by contributing changes to the vLLM project. Integrating Megablocks CUDA kernels for efficient inference, this move empowers users to deploy Mixtral seamlessly using Skypilot on cloud instances.
In summary, Mixtral 8x7B signifies a leap forward in the AI landscape, offering not just cutting-edge performance but a commitment to openness, innovation, and community-driven development. As Mistral AI continues its mission, Mixtral sets a new benchmark for the collaborative evolution of artificial intelligence.