Extended Thinking
Also known as: deep thinking, thinking mode, hybrid reasoning, thinking budget
Extended thinking is the consumer-facing version of inference-time compute scaling. Instead of answering immediately, Claude first generates an internal scratchpad of reasoning. You see the reasoning if you want to, and the final answer reflects that deeper deliberation. Anthropic lets developers set a token budget for thinking, so you can dial up accuracy at the cost of speed and tokens.
The practical trade-off: for routine tasks like drafting an email or doing a simple lookup, extended thinking is overkill and will cost more and be slower. For complex tasks like a legal analysis, a multi-step architecture decision, or a tricky debugging session, spending more tokens on thinking tends to produce noticeably better results.
Extended thinking is closely related to the reasoning model category (o-series from OpenAI, Claude's hybrid reasoning mode, Gemini Deep Think), but it's a feature flag you can toggle at the API level, not a separate model family. This gives developers granular control over the quality-cost-latency trade-off on a per-request basis.