A model for commercial use
On 5 May 2023 MosaicML releases MPT-7B (MosaicML Pretrained Transformer), a 7-billion-parameter model under the Apache 2.0 licence. The licence choice is deliberately aimed at enterprises: unlike Llama 2 (not yet released at the time) or the first version of Falcon, MPT can be used commercially without restrictions or royalties.
In June 2023 follows MPT-30B, with 30 billion parameters, optimised to run on a single 80GB GPU in 16-bit precision.
Architectural innovations
MPT introduces several notable technical choices into the open source space:
- ALiBi (Attention with Linear Biases) as positional encoding — enables extrapolation to context lengths beyond training without re-training
- FlashAttention — IO-aware attention implementation, reducing training and inference time
- No bias in linear layers and layer norms — improves training stability
- EleutherAI GPT-NeoX 20B tokeniser
MPT-7B training was performed on 1 trillion tokens with a published cost of around 200,000 USD, demonstrating that the open source frontier was reachable even with moderate training budgets.
Specialised variants
MosaicML releases several fine-tuned MPT variants:
- MPT-7B-Instruct — instruction following
- MPT-7B-Chat — assistant-style conversation
- MPT-7B-StoryWriter-65K+ — context window extended to 65,000 tokens (trained on books), a practical demonstration of ALiBi’s ability to handle long sequences
Instruct and Chat variants are Apache 2.0; StoryWriter is CC-BY-SA-3.0 due to fine-tuning dataset constraints.
Databricks acquisition and transition to DBRX
In June 2023 Databricks announces the acquisition of MosaicML for 1.3 billion dollars. The MosaicML team is integrated into Databricks as Mosaic AI Research. The MPT line does not receive further major updates and is subsequently replaced by DBRX (March 2024), which adopts a Mixture-of-Experts architecture.
MPT remains a historical reference for the 2023 period in which the commercial open source ecosystem consolidated, and its weights are still available on Hugging Face for research and comparison purposes.
Link: www.mosaicml.com
