Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date.
Mistral 7B in short
Mistral 7B is a 7.3B parameter model that:
- Outperforms Llama 2 13B on all benchmarks
- Outperforms Llama 1 34B on many benchmarks
- Approaches CodeLlama 7B performance on code, while remaining good at English tasks
- Uses Grouped-query attention (GQA) for faster inference
- Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost
We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions.
- Download it and use it anywhere (including locally) with our reference implementation
- Deploy it on any cloud (AWS/GCP/Azure), using vLLM inference server and skypilot
- Use it on HuggingFace
Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.
Performance in details
We compared Mistral 7B to the Llama 2 family, and re-run all model evaluations ourselves for fair comparison.
Performance of Mistral 7B and different Llama models on a wide range of benchmarks. For all metrics, all models were re-evaluated with our evaluation pipeline for accurate comparison. Mistral 7B significantly outperforms Llama 2 13B on all metrics, and is on par with Llama 34B (since Llama 2 34B was not released, we report results on Llama 34B). It is also vastly superior in code and reasoning benchmarks.
The benchmarks are categorized by their themes:
- Commonsense Reasoning: 0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
- World Knowledge: 5-shot average of NaturalQuestions and TriviaQA.
- Reading Comprehension: 0-shot average of BoolQ and QuAC.
- Math: Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4
- Code: Average o