Gemma: Google releases compact open models

Google DeepMind releases Gemma: 2B and 7B parameter models derived from Gemini technology, with decoder-only architecture, multi-query attention and permissive licence for deployment on limited hardware.

Open SourceAI Open SourceGemmaLLMGoogleAIEdge

Google enters the open model field

In February 2024, Google DeepMind releases Gemma, a family of open language models with 2 and 7 billion parameters. The name recalls the Latin word for “precious gem” and signals the connection with Gemini, Google’s proprietary model family. Gemma shares with Gemini part of the training infrastructure and architectural choices, but is designed specifically for deployment on resource-limited hardware.

Google’s decision to release open models comes at a time when the ecosystem of accessible language models is already rich — Llama, Mistral, Phi — but none of the competitors have access to Google’s training data and infrastructure. Gemma brings to the open field a model trained with the resources of one of the world’s largest research laboratories.

Architecture and technical specifications

Gemma adopts a decoder-only Transformer architecture with several technical choices aimed at efficiency. The 2B variant uses multi-query attention (MQA), where all query heads share a single set of keys and values. This drastically reduces the KV cache size during inference, making the model runnable on devices with limited memory — laptops, smartphones, edge devices.

The 7B variant uses standard multi-head attention, balancing capacity and computational cost for scenarios where resources are less constrained. Both variants use RMSNorm for normalisation, GeGLU as the activation function and rotary positional embeddings (RoPE) for positional encoding.

Variants and deployment

Each size is available in two variants: the base model (pre-trained) and the instruction-tuned model, optimised for following instructions in a conversational format. The instruction-tuned models are trained with RLHF (Reinforcement Learning from Human Feedback) techniques and are designed for direct use without further fine-tuning.

Licence and accessibility

Gemma is released with a permissive licence that allows commercial use, redistribution and creation of derivative works. The models are distributed through Hugging Face, Kaggle and the Vertex AI Model Garden on Google Cloud. Availability in GGUF format enables execution through frameworks such as llama.cpp and Ollama on consumer hardware.

Link: ai.google.dev/gemma

Need support? Under attack? Service Status
Need support? Under attack? Service Status