Grok-1: the open source MoE model from xAI

xAI releases Grok-1 under the Apache 2.0 licence: Mixture-of-Experts architecture with 314 billion parameters, 2 active experts out of 8, 8192 context. Base weights only, no fine-tuning.

Open SourceAI Open SourceGrokxAIMoELLMAI

The Grok-1 release

On 17 March 2024, xAI — the artificial intelligence company founded by Elon Musk — releases the weights of Grok-1 under the Apache 2.0 licence. The model, announced in previous months as the component powering the assistant integrated in the X platform, becomes one of the largest language models ever published under a fully open licence.

Grok-1 is distributed exclusively as a pre-trained base model, without instruction fine-tuning or RLHF alignment. This choice makes it a starting point for researchers and developers intending to build specialised variants, rather than a ready-to-use conversational assistant.

Mixture-of-Experts architecture

Grok-1 adopts a Mixture-of-Experts (MoE) architecture with 314 billion total parameters distributed across 8 experts, of which 2 are active for each token processed. The effective number of active parameters per inference is approximately 86 billion, a value that determines the actual computational cost but not the memory footprint.

The context window is 8,192 tokens, in line with reference models from 2023 but smaller than subsequent generations. The reference code is written in JAX and published on xAI’s GitHub repository. The overall weight of the released checkpoint is around 318 GB, a requirement that excludes inference on consumer hardware without aggressive quantisation.

Licence and context

The choice of Apache 2.0 places Grok-1 among the most permissive models in terms of commercial reuse: no restrictive usage clauses, no user threshold beyond which a separate licence is required. This openness is however limited to the first version: Grok-2 and Grok-3, developed by xAI in the following years, have remained proprietary and accessible only through APIs or within X products.

The Grok-1 release has primarily research value: its size makes execution difficult in environments without multi-node GPU clusters, but the availability of the weights enables in-depth studies on MoE routing, scaling and the behaviour of large-scale models outside closed industrial laboratories.

Link: x.ai/blog/grok-os

Need support? Under attack? Service Status
Need support? Under attack? Service Status