Llama: Meta opens the era of open language models

Meta releases LLaMA, a family of language models from 7 to 65 billion parameters: decoder-only Transformer architecture with RMSNorm, SwiGLU and rotary embeddings, competitive with GPT-3 at a fraction of parameters.

Open SourceAI Open SourceLlamaLLMMetaAINLP

A release that redefines model access

In February 2023, Meta publishes LLaMA (Large Language Model Meta AI), a family of language models ranging from 7 to 65 billion parameters. What stands out is not just the performance — competitive with GPT-3 while using a fraction of the parameters — but the decision to make the weights accessible to the research community. In a landscape dominated by proprietary models available only through APIs, LLaMA introduces a significant discontinuity.

The models are trained exclusively on publicly available data: Common Crawl, Wikipedia, GitHub, ArXiv, Books3, Stack Exchange. The choice is deliberate: demonstrating that state-of-the-art performance can be achieved without relying on proprietary datasets, making the training process reproducible by the scientific community.

Architecture and technical choices

LLaMA adopts a decoder-only Transformer architecture, the same architectural family as GPT, with several notable technical modifications. RMSNorm (Root Mean Square Layer Normalization) replaces the standard LayerNorm, reducing the computational cost of normalisation without quality loss. The SwiGLU activation replaces the traditional ReLU in feed-forward layers, improving learning efficiency.

Rotary positional embeddings (RoPE) encode positional information directly into the attention mechanism, allowing the model to generalise better on variable-length sequences compared to absolute positional embeddings. These architectural choices will become the de facto standard for subsequent language models.

Impact on the ecosystem

The 65-billion-parameter model achieves performance comparable to GPT-3 (175B parameters) on several benchmarks, while the 13B model outperforms GPT-3 on many tasks despite being able to run on a single GPU. This efficiency opens unprecedented scenarios: language model research is no longer limited to organisations with enormous computational resources.

A new balance

The release of LLaMA shifts the debate on open language models from theory to practice. Universities, independent research laboratories and companies that cannot afford the training costs of a proprietary model gain access to a competitive starting point. The artificial intelligence industry, until then concentrated around a few API providers, begins to diversify.

Link: ai.meta.com/llama

Need support? Under attack? Service Status
Need support? Under attack? Service Status