Jamba: AI21's Mamba-Transformer hybrid

AI21 Labs releases Jamba: the first production-grade model combining Mamba SSM blocks, attention and MoE. 52B total parameters, 12B active, 256K context under Apache 2.0 licence.

Open SourceAI Open SourceJambaAI21MambaSSMLLMAI

A hybrid architecture

On 28 March 2024, AI21 Labs releases Jamba, presented as the first production-grade language model to combine Mamba (State Space Models) blocks with traditional attention layers and Mixture-of-Experts components. The stated goal is to unite the efficiency of SSMs in handling long sequences with the contextual reasoning capability of Transformers.

Unlike models based exclusively on attention, whose complexity grows quadratically with sequence length, Jamba employs alternating blocks in which the Mamba component contributes linear complexity. The result is a model able to handle a context window of 256,000 tokens while keeping memory consumption significantly lower than an equivalent pure Transformer.

Structure and parameters

Jamba has 52 billion total parameters, of which 12 billion active thanks to MoE routing. The architecture is organised in blocks that alternate SSM and attention layers in a fixed ratio, with selectively active MoE modules. This scheme allows the Mamba components to handle long-term memory management, while attention is reserved for more precise local reasoning.

The initial version is published under the Apache 2.0 licence, making the model reusable in commercial contexts without restrictions. In the Jamba 1.5 release, in August 2024, AI21 introduces its own Jamba Open Model License, which maintains a high degree of openness but introduces specific conditions on use and redistribution.

Technical relevance

Jamba is significant because it demonstrates that hybrid SSM-Transformer architectures can achieve quality competitive with purely Transformer models, opening an alternative research direction to the dominant paradigm. The combination of extended context window and reduced computational cost makes it particularly suitable for use cases requiring long document processing — legal analysis, scientific review, RAG with extended chunks — without resorting to context compression techniques.

Link: ai21.com/jamba

Need support? Under attack? Service Status
Need support? Under attack? Service Status