Model distillation
Also known as: knowledge distillation, distilled model, model compression
Full-sized frontier models are expensive to run. Distillation is a technique for creating smaller, cheaper models that are nearly as good for specific tasks. The student model is not trained on raw data the way the teacher was; it is trained on the teacher's output distributions, learning to mimic what the teacher would say rather than learning from scratch.
Distillation enables several things that matter to builders: cheaper inference (smaller models cost less per token), faster response times, and the ability to deploy capable models on hardware that cannot run the full-sized version. Many of the smaller 'mini' and 'flash' models from major AI labs are produced through distillation from their larger counterparts.
One nuance: distilled models tend to be excellent in the range of tasks the teacher handled well, but they inherit any blind spots and may be more brittle on edge cases. They also tend to do best when the distillation data closely matches the domain they will be used in, which is why domain-specific distillation is often more effective than general-purpose distillation for niche applications.