A Small Language Model family
The Phi series is developed by Microsoft Research as a compact alternative to Large Language Models. The central hypothesis, formalised in the paper “Textbooks Are All You Need”, is that the quality of training data — in particular synthetic or filtered educational text — matters more than raw parameter scale.
Phi-1 is released in June 2023 with 1.3 billion parameters, specialised in Python code generation, reaching 50.6% pass@1 on HumanEval despite its small size. Phi-2, in December 2023, raises parameters to 2.7 billion, generalising the paradigm beyond code.
Phi-3 and MIT licence
On 23 April 2024 Microsoft releases the Phi-3 family under the MIT licence, permissive and commercially usable without restrictions. The family includes three sizes:
- Phi-3-mini — 3.8 billion parameters, context up to 128K tokens (
phi-3-mini-128k-instructvariant) in addition to the 4K variant - Phi-3-small — 7 billion parameters, 128K context
- Phi-3-medium — 14 billion parameters, 128K context
The Phi-3 training set combines heavily filtered web data and synthetic data generated to maximise information density. Results on standard benchmarks (MMLU, HellaSwag, GSM8K) show Phi-3-mini competitive with Llama-3 8B class models.
Edge and on-device optimisation
Phi-3-mini is designed for execution on resource-constrained devices: 4-bit quantisation enables running on modern smartphones and workstations without dedicated GPUs. Microsoft publishes ONNX variants optimised for DirectML and CPU, making the model suitable for edge scenarios and local inference with privacy requirements.
Phi-3.5 and evolution
In August 2024 Microsoft releases Phi-3.5 with three variants: Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct (16 experts, 42B total, 6.6B active) and Phi-3.5-vision-instruct multimodal. All models remain under the MIT licence and are distributed on Hugging Face.
The Phi line demonstrates that parameter reduction does not necessarily imply loss of quality: with a curated data pipeline it is possible to achieve production-grade performance with an order of magnitude fewer resources.
Link: huggingface.co/microsoft
