Mode collapse
Also known as: output collapse, diversity collapse, homogenization
Mode collapse comes from the GAN (Generative Adversarial Network) literature, where it described a failure where the generator learns to produce one or a few convincing outputs instead of the full range of real data. In LLM contexts it's used more loosely to describe several related problems: a fine-tuned model that only produces outputs in a single rigid format, a model that gives the same answer no matter how you rephrase the question, or a model trained on AI-generated data that becomes increasingly generic and loses the diversity of the original training distribution.
RLHF training can cause a form of mode collapse where models become overly safe or sycophantic, converging to a narrow band of agreeable, hedged, politically neutral responses because those score well on human preference ratings. The model's effective range of expression narrows even when the problem would benefit from a more direct or diverse response.
For builders, mode collapse is worth watching for when fine-tuning on a narrow dataset. If you train a model only on your company's customer service transcripts, it may lose the ability to respond naturally to anything outside those patterns. Mixing fine-tuning data with general data, and monitoring output diversity, helps prevent it.