MPT: MosaicML's Open Source commercial models

MosaicML releases MPT-7B on 5 May 2023, then MPT-30B in June 2023. Apache 2.0, ALiBi for long contexts, FlashAttention. Instruct, Chat, StoryWriter-65K+ variants.

Open SourceAI Open SourceMPTMosaicMLLLMAIALiBi

A model for commercial use

On 5 May 2023 MosaicML releases MPT-7B (MosaicML Pretrained Transformer), a 7-billion-parameter model under the Apache 2.0 licence. The licence choice is deliberately aimed at enterprises: unlike Llama 2 (not yet released at the time) or the first version of Falcon, MPT can be used commercially without restrictions or royalties.

A larger successor, MPT-30B, is planned for the following weeks as a model optimised to run on a single 80GB GPU in 16-bit precision.

Architectural innovations

MPT introduces several notable technical choices into the Open Source space:

  • ALiBi (Attention with Linear Biases) as positional encoding — enables extrapolation to context lengths beyond training without re-training
  • FlashAttention — IO-aware attention implementation, reducing training and inference time
  • No bias in linear layers and layer norms — improves training stability
  • EleutherAI GPT-NeoX 20B tokeniser

MPT-7B training was performed on 1 trillion tokens with a published cost of around 200,000 USD, demonstrating that the Open Source frontier was reachable even with moderate training budgets.

Specialised variants

MosaicML releases several fine-tuned MPT variants:

  • MPT-7B-Instruct — instruction following
  • MPT-7B-Chat — assistant-style conversation
  • MPT-7B-StoryWriter-65K+ — context window extended to 65,000 tokens (trained on books), a practical demonstration of ALiBi’s ability to handle long sequences

Instruct and Chat variants are Apache 2.0; StoryWriter is CC-BY-SA-3.0 due to fine-tuning dataset constraints.

Significance

MPT marks a significant milestone for the commercial Open Source ecosystem: frontier-quality models with a clean Apache 2.0 licence and architectural choices (ALiBi, FlashAttention) that will influence subsequent training efforts.

Link: www.mosaicml.com

Need support? Under attack? Service Status
Need support? Under attack? Service Status