Google enters the open model field
In February 2024, Google DeepMind releases Gemma, a family of open language models with 2 and 7 billion parameters. The name recalls the Latin word for “precious gem” and signals the connection with Gemini, Google’s proprietary model family. Gemma shares with Gemini part of the training infrastructure and architectural choices, but is designed specifically for deployment on resource-limited hardware.
Google’s decision to release open models comes at a time when the ecosystem of accessible language models is already rich — Llama, Mistral, Phi — but none of the competitors have access to Google’s training data and infrastructure. Gemma brings to the open field a model trained with the resources of one of the world’s largest research laboratories.
Architecture and technical specifications
Gemma adopts a decoder-only Transformer architecture with several technical choices aimed at efficiency. The 2B variant uses multi-query attention (MQA), where all query heads share a single set of keys and values. This drastically reduces the KV cache size during inference, making the model runnable on devices with limited memory — laptops, smartphones, edge devices.
The 7B variant uses standard multi-head attention, balancing capacity and computational cost for scenarios where resources are less constrained. Both variants use RMSNorm for normalisation, GeGLU as the activation function and rotary positional embeddings (RoPE) for positional encoding.
Variants and deployment
Each size is available in two variants: the base model (pre-trained) and the instruction-tuned model, optimised for following instructions in a conversational format. The instruction-tuned models are trained with RLHF (Reinforcement Learning from Human Feedback) techniques and are designed for direct use without further fine-tuning.
Licence and accessibility
Gemma is released with a permissive licence that allows commercial use, redistribution and creation of derivative works. The models are distributed through Hugging Face, Kaggle and the Vertex AI Model Garden on Google Cloud. Availability in GGUF format enables execution through frameworks such as llama.cpp and Ollama on consumer hardware.
Link: ai.google.dev/gemma
