Concept·AI Models & Capabilities·Added 1 month ago

Gemini Flash

Also known as: Flash, Gemini Flash tier

Google's speed-and-efficiency tier within the Gemini model family. Flash models prioritize low latency and low cost while staying close to frontier performance. As of May 2026, Gemini 3.5 Flash is the newest and most capable Flash model, and 3 Flash is the Gemini app default.

Gemini Flash is not a single model but a recurring tier name across Gemini generations, always denoting the version optimized for speed and cost over maximum capability. Flash models started with Gemini 1.5 Flash in mid-2024 and have appeared in every subsequent generation. The naming signals: faster inference, lower per-token pricing, and typically a smaller active parameter footprint than the Pro tier, though the capability gap has narrowed significantly across generations.

As of May 2026, Gemini 3.5 Flash (released at Google I/O, May 19, 2026) is the newest Flash model. It achieves frontier-class performance on agentic and coding tasks, beating Gemini 3.1 Pro on 11 of 15 published benchmarks at the same cost as earlier Flash models ($1.50/$9 per million tokens). This is a meaningful shift: the Flash tier no longer means 'weaker'; it means 'faster and cheaper' with competitive quality on most production tasks. Gemini 3 Flash, the previous generation, remains the default model in the Gemini app.

For builders, the Flash tier is often the practical starting point. If you're building on Gemini, begin with the latest Flash model for speed and cost efficiency, validate quality against your task, and only move to Pro if the task specifically requires stronger pure reasoning (like Humanity's Last Exam-style problems or very complex agentic planning). Flash also supports a 1 million token context window at the same pricing as shorter prompts.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms