← Back to glossary
+Suggest a term
Concept·AI Models & Capabilities·Added 1 day ago

The Bitter Lesson

Also known as: bitter lesson, Sutton's bitter lesson, compute over craft

A 2019 essay by AI researcher Rich Sutton arguing that general methods scaled with computation consistently outperform approaches that bake in human domain knowledge. The 'bitter' part: AI researchers keep learning this lesson and keep forgetting it.

Sutton looked at 70 years of AI history and noticed a pattern: whenever researchers tried to hard-code expert knowledge into AI systems (hand-crafting chess strategies, encoding grammar rules, engineering visual features), they were eventually beaten by simpler methods that just scaled up computation and learning. More data, more compute, more general algorithms. The clever hand-designed approach loses to brute force, repeatedly.

The lesson is 'bitter' because it's humbling. Researchers invest enormous effort in building intelligent structure into AI systems, get short-term wins, and then watch general methods catch up and surpass them once compute increases. The implication is that the smartest thing AI researchers can do is invest in scalable, general methods rather than encoding specific human knowledge.

This essay is referenced constantly in AI discourse, often as a lens for arguing about research directions or product strategy. When someone says 'just scale it,' they're often implicitly invoking the bitter lesson. It's also used to explain why LLMs became so powerful: not because anyone engineered language understanding into them, but because training on vast text with enough compute produced it as an emergent property.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.
Related terms
Scaling lawsFoundation modelPretrainingEmergent behavior