The Bitter Lesson
Also known as: bitter lesson, Sutton's bitter lesson, compute over craft
Sutton looked at 70 years of AI history and noticed a pattern: whenever researchers tried to hard-code expert knowledge into AI systems (hand-crafting chess strategies, encoding grammar rules, engineering visual features), they were eventually beaten by simpler methods that just scaled up computation and learning. More data, more compute, more general algorithms. The clever hand-designed approach loses to brute force, repeatedly.
The lesson is 'bitter' because it's humbling. Researchers invest enormous effort in building intelligent structure into AI systems, get short-term wins, and then watch general methods catch up and surpass them once compute increases. The implication is that the smartest thing AI researchers can do is invest in scalable, general methods rather than encoding specific human knowledge.
This essay is referenced constantly in AI discourse, often as a lens for arguing about research directions or product strategy. When someone says 'just scale it,' they're often implicitly invoking the bitter lesson. It's also used to explain why LLMs became so powerful: not because anyone engineered language understanding into them, but because training on vast text with enough compute produced it as an emergent property.