Concept·AI Models & Capabilities·Added 1 month ago

RL

Also known as: reinforcement learning, reward-based training, RL training

Reinforcement Learning. A training approach where a model learns by receiving feedback on its outputs: good results get rewarded, bad ones get penalized. The model adjusts to chase more reward over time. Behind RLHF, reasoning models, and a lot of recent capability gains.

RL is one of the core ideas in how modern AI models are trained beyond the initial pretraining phase. Instead of just predicting the next token in a text (as in pretraining), an RL-trained model tries actions, gets a score on how good those actions were, and updates its behavior to get higher scores over time. You can think of it as training by trial and correction rather than by example.

In the context of language models, RL shows up most prominently in RLHF (Reinforcement Learning from Human Feedback), where human raters score model outputs and those scores shape the model's future behavior. It also underpins reasoning models: systems like o3 or extended-thinking models that 'think longer' on hard problems have been trained with RL to practice problem-solving strategies that actually work.

Why it matters as a builder concept: when you hear that a model is 'RL-tuned for coding' or that a lab 'scaled up RL training,' it means they've invested in teaching the model to pursue better outcomes, not just better-sounding outputs. RL is also the mechanism behind reward hacking, one of the messier failure modes to understand.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms