An experiment on extended training
TinyLlama is a community research project started in September 2023 by Zhang Peiyuan and collaborators at the StatNLP Research Group of the Singapore University of Technology and Design (SUTD). The TinyLlama 1.0 version is released on 4 January 2024 under the Apache 2.0 licence.
Its stated goal is to answer an open research question on LLMs: to what extent does a small model, trained well beyond the “compute-optimal” points indicated by the Chinchilla scaling laws, keep improving on downstream benchmarks?
Architecture and training
TinyLlama has 1.1 billion parameters and adopts an architecture compatible with Llama 2: same tokenizer, same basic structural hyperparameters, same normalisation and attention choices. This compatibility is intentional and allows the full Llama tooling ecosystem — inference, quantisation, fine-tuning, deployment — to be reused without adaptations.
Training is performed on roughly 3 trillion tokens, combining two public datasets: SlimPajama — a deduplicated and cleaned version of RedPajama — and StarCoderData, the code corpus also used by StarCoder. For a model of this size, 3T tokens represent a very extensive training regime.
The empirical result
The benchmarks published by the authors show that TinyLlama keeps improving its performance even in the final part of training, confirming that small models do not “saturate” as quickly as suggested by the original scaling laws. The result matters practically: compact models, when trained long enough, can reach useful performance in edge, mobile or memory-constrained contexts.
Licence and use
The Apache 2.0 licence and compatibility with Llama 2 make TinyLlama easy to integrate into existing workflows. The model is also available in fine-tuned chat variants and quantised versions, and is often used as a baseline in small language model comparisons.
