DBRX: Databricks' open source Mixture-of-Experts

Databricks / Mosaic Research releases DBRX on 27 March 2024: fine-grained MoE with 132B total parameters, 36B active, 16 experts of which 4 active per token, 32K context.

Open SourceAI Open SourceDBRXDatabricksMoELLMAI

The successor to MPT

On 27 March 2024 Databricks, through the Mosaic Research team (formerly MosaicML, acquired in June 2023), releases DBRX, a large Mixture-of-Experts model. DBRX replaces the MPT line and represents the team’s first public iteration after the acquisition.

The model is distributed in DBRX Base (pre-trained) and DBRX Instruct (fine-tuned for instruction following) variants on Hugging Face.

Fine-grained MoE

DBRX adopts a fine-grained Mixture-of-Experts architecture, a variant that increases the total number of experts while reducing their individual size. The parameters:

  • 132 billion total parameters
  • 36 billion parameters active per token
  • 16 experts total, of which 4 active per token

The comparison with Mixtral 8x7B is direct: Mixtral has 8 experts with 2 active (1:4 ratio), DBRX has 16 experts with 4 active (same 1:4 ratio but with double granularity). Databricks’ paper argues that finer granularity increases the router’s combinatorial capacity and improves specialisation.

Other characteristics:

  • Context window of 32,768 tokens
  • GPT-4 tiktoken tokeniser (cl100k_base)
  • Training on 12 trillion tokens of filtered data
  • RoPE positioning

Databricks Open Model Licence

DBRX is not released under Apache 2.0 or MIT. The licence is the Databricks Open Model Licence, designed along the lines of Meta’s Llama licence: it allows use, redistribution, fine-tuning and derivative creation, with the restriction that organisations with more than 700 million monthly active users must request a separate commercial agreement with Databricks.

The licence also includes an Acceptable Use Policy prohibiting illegal, violent or deceptive uses. For standard enterprise use — the vast majority of practical cases — the licence is effectively equivalent to a permissive one.

Positioning

DBRX was released at a time when the open source MoE ecosystem was consolidating (Mixtral 8x7B in December 2023, Grok-1 in March 2024). The benchmarks published by Databricks at release show DBRX competitive with GPT-3.5 and Llama 2 70B on several tasks, with particular emphasis on programming and mathematical reasoning.

The model is integrated into the Databricks platform as a base for domain-specific fine-tuning and enterprise applications.

Link: www.databricks.com/dbrx

Need support? Under attack? Service Status
Need support? Under attack? Service Status