Emergent behavior
Also known as: emergent capabilities, emergent abilities, emergence, unexpected capabilities
As language models get bigger and are trained on more data, they sometimes develop abilities that smaller models simply don't have, and the transition can happen surprisingly fast: performance on a task stays near chance across many model sizes, then suddenly jumps. Researchers call these emergent capabilities. Classic examples include the ability to do multi-step arithmetic, reason about false beliefs, or follow complex instructions.
Whether emergence is truly 'sudden' or just an artifact of how we measure it is debated. One view is that the capability was always building gradually inside the model but wasn't detectable until it crossed a threshold where outputs became consistently correct. Another view is that qualitatively new capabilities really do appear at scale in ways that can't be predicted by extrapolating from smaller models.
For builders, emergent behavior cuts both ways. The good version: capabilities you didn't expect might appear in a new model, making your product more powerful without extra engineering. The bad version: unexpected behaviors can also be bad, including unexpected failures or safety-relevant outputs that weren't obvious in testing. This is part of why evals and red-teaming (adversarial testing) are treated as ongoing practices rather than one-time checkboxes.