Concept·AI Models & Capabilities·Added 1 month ago

Mode collapse

Also known as: output collapse, diversity collapse, homogenization

When a model learns to produce only a narrow range of outputs, ignoring the full diversity it should be capable of. Fine-tuned models can collapse to a single style or answer pattern. Models trained on synthetic data can spiral into increasingly homogeneous outputs.

Mode collapse comes from the GAN (Generative Adversarial Network) literature, where it described a failure where the generator learns to produce one or a few convincing outputs instead of the full range of real data. In LLM contexts it's used more loosely to describe several related problems: a fine-tuned model that only produces outputs in a single rigid format, a model that gives the same answer no matter how you rephrase the question, or a model trained on AI-generated data that becomes increasingly generic and loses the diversity of the original training distribution.

RLHF training can cause a form of mode collapse where models become overly safe or sycophantic, converging to a narrow band of agreeable, hedged, politically neutral responses because those score well on human preference ratings. The model's effective range of expression narrows even when the problem would benefit from a more direct or diverse response.

For builders, mode collapse is worth watching for when fine-tuning on a narrow dataset. If you train a model only on your company's customer service transcripts, it may lose the ability to respond naturally to anything outside those patterns. Mixing fine-tuning data with general data, and monitoring output diversity, helps prevent it.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms