A model for commercial use
On 5 May 2023 MosaicML releases MPT-7B (MosaicML Pretrained Transformer), a 7-billion-parameter model under the Apache 2.0 licence. The licence choice is deliberately aimed at enterprises: unlike Llama 2 (not yet released at the time) or the first version of Falcon, MPT can be used commercially without restrictions or royalties.
A larger successor, MPT-30B, is planned for the following weeks as a model optimised to run on a single 80GB GPU in 16-bit precision.
Architectural innovations
MPT introduces several notable technical choices into the Open Source space:
- ALiBi (Attention with Linear Biases) as positional encoding — enables extrapolation to context lengths beyond training without re-training
- FlashAttention — IO-aware attention implementation, reducing training and inference time
- No bias in linear layers and layer norms — improves training stability
- EleutherAI GPT-NeoX 20B tokeniser
MPT-7B training was performed on 1 trillion tokens with a published cost of around 200,000 USD, demonstrating that the Open Source frontier was reachable even with moderate training budgets.
Specialised variants
MosaicML releases several fine-tuned MPT variants:
- MPT-7B-Instruct — instruction following
- MPT-7B-Chat — assistant-style conversation
- MPT-7B-StoryWriter-65K+ — context window extended to 65,000 tokens (trained on books), a practical demonstration of ALiBi’s ability to handle long sequences
Instruct and Chat variants are Apache 2.0; StoryWriter is CC-BY-SA-3.0 due to fine-tuning dataset constraints.
Significance
MPT marks a significant milestone for the commercial Open Source ecosystem: frontier-quality models with a clean Apache 2.0 licence and architectural choices (ALiBi, FlashAttention) that will influence subsequent training efforts.
Link: www.mosaicml.com