A competitive European model
In September 2023, Mistral AI — a Paris-based startup founded by former researchers from Meta and Google DeepMind — releases Mistral 7B under the Apache 2.0 licence, with no usage restrictions. It is the first European-origin language model to reach competitive performance with significantly larger models, and the first to do so with a fully open licence.
Mistral 7B outperforms Llama 2 13B on most standard benchmarks — reasoning, language understanding, code generation — using half the parameters. The efficiency difference matters not only in terms of computational costs but in accessibility: a 7-billion-parameter model can run on far cheaper hardware than a 13-billion-parameter one.
Architectural choices
The architecture of Mistral 7B introduces three technical innovations compared to models of the same generation. Sliding window attention (SWA) limits the attention computation to a fixed token window (4,096 tokens) per layer, rather than computing attention over the entire sequence. Information beyond the window is propagated indirectly through subsequent layers, allowing the model to handle long contexts with computational cost that grows linearly rather than quadratically.
Grouped-query attention (GQA) reduces the number of keys and values in the attention mechanism by grouping multiple query heads under the same key-value pairs. The result is a significant reduction in memory required for the KV cache during inference, without measurable quality degradation.
The byte-fallback BPE tokenizer handles any byte sequence, including characters absent from the training vocabulary. When the tokenizer encounters an unknown token, it decomposes it into individual bytes rather than generating a special error token. This approach guarantees multilingual robustness and the ability to process non-standard text formats.
Licence and implications
The choice of the Apache 2.0 licence — without the usage restrictions present in the Llama 2 licence — makes Mistral 7B usable in any commercial context without limitations. For the European artificial intelligence ecosystem, Mistral provides evidence that competitiveness in foundational models is not exclusive to American or Chinese laboratories.
Link: mistral.ai
