Ollama: open source language models on your own hardware

Ollama simplifies local execution of open source LLMs: Modelfile for configuration, REST API, GGUF quantization for consumer hardware, support for Llama 2 and CodeLlama.

Open SourceAI Open SourceOllamaLLMAIPrivacyOn-PremiseQuantization

The local accessibility problem

Open source language models are available, but running them on local hardware remains complex: downloading weights in the correct format, configuring CUDA or Metal dependencies, managing quantization, exposing an API for applications. Each step requires specific expertise and different configurations depending on the model, operating system and available hardware. Ollama was created to eliminate this complexity, offering a user experience comparable to that of a package manager.

One command to run a model

Installing Ollama requires a single binary. Running a model is reduced to one command: ollama run llama2 downloads the model, quantizes it if necessary and starts an interactive session. Supported models include Llama 2, CodeLlama and dozens of other community open source models. The model registry works similarly to a container image registry: each model is identified by a name and a tag specifying the variant (size, quantization).

Modelfile and customisation

The Modelfile is Ollama’s configuration mechanism, inspired by the Dockerfile. It allows defining the base model, generation parameters (temperature, top_p, context), the system prompt and LoRA adapters for customisation. A typical Modelfile specifies the base model, sets inference parameters and defines a default behaviour — all in a declarative, versionable format.

Quantization and REST API

Ollama uses the GGUF (GPT-Generated Unified Format) from llama.cpp for model quantization. The 4-bit and 5-bit variants reduce memory requirements from tens of gigabytes to sizes manageable on consumer hardware: a 7-billion-parameter model quantized to 4 bits requires roughly 4 GB of RAM, runnable on a laptop with an integrated GPU.

The REST API exposed by Ollama allows applications to interact with local models via standard HTTP endpoints. Generation happens entirely locally: data never leaves the machine, a fundamental requirement for enterprise contexts with confidentiality constraints or regulatory compliance.

Link: ollama.com

Need support? Under attack? Service Status
Need support? Under attack? Service Status