← Back to glossary
+Suggest a term
Concept·AI Models & Capabilities·Added 1 day ago

Pretraining

Also known as: pre-training, foundation model pretraining, LLM pretraining, next-token prediction training

The first and most compute-intensive phase of building a large language model. The model is trained on a massive corpus of text to predict the next token, learning grammar, facts, reasoning patterns, and world knowledge before any task-specific tuning begins.

Pretraining is where the base capabilities of a large language model come from. The model is fed an enormous corpus of text, often hundreds of billions to trillions of tokens drawn from the web, books, code, and other sources, and trained with a simple objective: given the previous tokens in a sequence, predict what comes next. Doing this at scale, across enough diverse data with enough parameters, causes the model to implicitly learn language structure, factual knowledge, and surprisingly broad reasoning abilities.

This phase is expensive in both compute and time. Training runs for frontier models like GPT-5 or Claude Opus consume thousands of GPUs or TPUs running for weeks or months. The resulting artifact is a base model, sometimes called a pretrained model or foundation model. It knows a lot about language and the world, but it does not yet know how to follow instructions or behave safely as an assistant. Those qualities come from the stages that follow, such as supervised fine-tuning and RLHF (reinforcement learning from human feedback), collectively called post-training.

Scaling laws, a set of empirical observations from AI research, describe how model quality improves predictably as you increase training compute, dataset size, and parameter count during pretraining. This relationship has driven much of the investment in frontier AI: labs compete to do larger pretraining runs because larger runs reliably produce more capable base models. For builders, pretraining is largely invisible day-to-day, but it determines the ceiling of what a model can ever be good at, no matter how much you fine-tune it afterward.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.
Related terms
Base modelFine-tuningPost-trainingRLHFSFTFoundation modelScaling laws