The story isn’t the model, it’s the licence
Around the turn of late March and early April, Google launched Gemma 4, its most recent family of open models. The relevant news, though, is not about the model’s capabilities but about the licence change: Gemma 4 ships under Apache 2.0, dropping the custom “Gemma Terms of Use” that governed the previous versions. It is a move from a proprietary licence with ambiguous usage clauses to a standard permissive licence, allowing free commercial use and redistribution with no conditional strings attached. Google framed the choice as a direct response to developer feedback.
For anyone following the topic, it is worth recalling the starting point: the first Gemma shipped under a permissive but proprietary licence, along the lines of the model adopted by Llama. With Gemma 4, Google aligns its open family with the same licensing footing as Qwen and Mistral, already on Apache 2.0 — ground we described in the context of Qwen3.6.
The family: edge, MoE and dense
The initial launch brought several sizes. On the edge side there are the E2B and E4B variants, optimised for resource-constrained devices. Higher up, a 26B Mixture-of-Experts model with roughly 4B active parameters and a 31B dense model. The whole family comes from the same research behind Gemini 3, with native training on more than 140 languages.
Multimodality is native and pervasive: every model processes images and video, while native audio input is present on E2B, E4B and now the mid-size variant. The context window reaches up to 128K on the edge models and up to 256K on the larger ones.
Gemma 4 12B: the model that runs on a laptop
The most recent variant, Gemma 4 12B, was added in these days: a unified encoder-free multimodal model, positioned between E4B and the 26B MoE. It is the first mid-size model in the family with native audio input, with a 256K context window, distributed on Hugging Face and Kaggle under Apache 2.0.
The operational figure is the hardware it requires: the 12B runs on consumer hardware with around 16 GB of VRAM or unified memory — what many laptops and desktops already ship with. You do not need a data centre to run it locally, which closes the loop on Gemma’s historical design, aimed from the outset at deployment on limited hardware.
What this means in practice
For a company, the interest of the release lies not in benchmark numbers but in reduced downstream friction. A proprietary licence, however permissive, forces enterprise legal teams to read specific usage clauses, weigh edge cases and produce an opinion on each commercial or on-premise deployment scenario. Apache 2.0, by contrast, is a well-known licence already digested by any legal office: by industry analysis, for whoever has to approve a model’s adoption, the review goes “from weeks to hours”.
That is exactly the angle that matters in a governance logic. A capable model does run on a laptop, but the real unlock is that you can get there without a dedicated legal review for every project. When the barrier to adoption shifts from the contractual plane to the purely technical one, on-premise AI becomes an infrastructure decision rather than a compliance one — and that simplification, on the adoption side, weighs more than a few benchmark points.