Phi: open source Small Language Models from Microsoft Research

Microsoft Research's Phi series from Phi-1 (2023) to Phi-3 (April 2024) under MIT licence: SLMs trained with 'textbook quality data', 128K context on Phi-3-mini.

Open SourceAI Open SourcePhiSLMMicrosoftLLMAI

A Small Language Model family

The Phi series is developed by Microsoft Research as a compact alternative to Large Language Models. The central hypothesis, formalised in the paper “Textbooks Are All You Need”, is that the quality of training data — in particular synthetic or filtered educational text — matters more than raw parameter scale.

Phi-1 is released in June 2023 with 1.3 billion parameters, specialised in Python code generation, reaching 50.6% pass@1 on HumanEval despite its small size. Phi-2, in December 2023, raises parameters to 2.7 billion, generalising the paradigm beyond code.

Phi-3 and MIT licence

On 23 April 2024 Microsoft releases the Phi-3 family under the MIT licence, permissive and commercially usable without restrictions. The family includes three sizes:

  • Phi-3-mini — 3.8 billion parameters, context up to 128K tokens (phi-3-mini-128k-instruct variant) in addition to the 4K variant
  • Phi-3-small — 7 billion parameters, 128K context
  • Phi-3-medium — 14 billion parameters, 128K context

The Phi-3 training set combines heavily filtered web data and synthetic data generated to maximise information density. Results on standard benchmarks (MMLU, HellaSwag, GSM8K) show Phi-3-mini competitive with Llama-3 8B class models.

Edge and on-device optimisation

Phi-3-mini is designed for execution on resource-constrained devices: 4-bit quantisation enables running on modern smartphones and workstations without dedicated GPUs. Microsoft publishes ONNX variants optimised for DirectML and CPU, making the model suitable for edge scenarios and local inference with privacy requirements.

Phi-3.5 and evolution

In August 2024 Microsoft releases Phi-3.5 with three variants: Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct (16 experts, 42B total, 6.6B active) and Phi-3.5-vision-instruct multimodal. All models remain under the MIT licence and are distributed on Hugging Face.

The Phi line demonstrates that parameter reduction does not necessarily imply loss of quality: with a curated data pipeline it is possible to achieve production-grade performance with an order of magnitude fewer resources.

Link: huggingface.co/microsoft

Need support? Under attack? Service Status
Need support? Under attack? Service Status