Gemini 3.5 Flash: The Fast Model That Beats Last Year's Flagship

Google shipped Gemini 3.5 Flash on May 19, 2026 at Google I/O, and the headline claim is a real one: a Flash-tier model now beats last generation's flagship. Google DeepMind's chief technologist Koray Kavukcuoglu put it plainly, 3.5 Flash "outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks."

That is genuinely notable. The industry's working assumption has been that the smartest models are the slowest and most expensive, and the "Flash" or "mini" tier is where you go when you need cheap and fast and can accept being dumber. Gemini 3.5 Flash scrambles that.

But the honest version of this story has two asterisks: it does not beat 3.1 Pro at everything, and the "Flash" brand no longer means "cheap." Here is the full picture.

What Launched

Gemini 3.5 Flash is generally available the same day it was announced, no preview period, across the Gemini app, AI Mode in Google Search, Google Antigravity, the Gemini API (AI Studio and Android Studio), Vertex AI, and Gemini Enterprise.

Spec	Gemini 3.5 Flash
API model ID	`gemini-3.5-flash`
Context window	1M input tokens / 64K output tokens
Knowledge cutoff	January 2026
Modalities	Text, image, speech, video in; text out. Reasoning model with a thinking-budget mode.
Output speed	~280 tokens/sec (independently measured by Artificial Analysis, #2 of all models tracked)
API price	$1.50 input / $9.00 output / $0.15 cached input, per 1M tokens

A second model, Gemini 3.5 Pro, is not out yet. Google says it is "already being used internally" with a rollout expected next month. No specs or benchmarks for Pro have been published.

The Benchmarks, and What They Actually Mean

Google led the launch with four benchmark numbers. Here is what each one is, so the scores mean something:

Benchmark	What it tests	3.5 Flash score
Terminal-Bench 2.1	Real coding-agent quality, inspect a sandboxed environment, edit files, run commands, recover from errors, finish a task end-to-end	76.2%
MCP Atlas	Tool-use competency against real Model Context Protocol servers, with multi-hop reasoning (one tool's output feeds the next)	83.6%
CharXiv Reasoning	Reading and reasoning about real scientific charts and diagrams (not toy visual quizzes)	84.2%
GDPval-AA	Performance on economically valuable real-world work across many occupations, scored as an Elo rating	1656 Elo

In plain English: on the percentage benchmarks, 75 to 85% is the current frontier band for these hard 2026 agentic suites, so 76 to 84% is genuinely strong. The Elo number is the eye-catcher, 1656 on GDPval-AA puts 3.5 Flash in the top cluster of all models, just behind GPT-5.4 at extra-high effort.

Flash vs last-gen Pro, head to head

This is the news. Google's published numbers, reproduced by independent trackers:

Benchmark	3.5 Flash	3.1 Pro	Winner
Terminal-Bench 2.1 (coding agent)	76.2%	70.3%	Flash +5.9
MCP Atlas (tool use)	83.6%	78.2%	Flash +5.4
CharXiv Reasoning (multimodal)	84.2%	83.3%	Flash +0.9
Finance Agent v2	57.9%	43.0%	Flash +14.9
GDPval-AA (real-world work, Elo)	1656	1314	Flash +342

A Flash-tier model beating last year's flagship by 342 Elo points on real-world work, while running about 4x faster, is the kind of result that resets expectations.

Here is the part most launch-day coverage skipped. Google chose those four headline benchmarks deliberately, they are all agentic, tool-use, and multimodal tests. Google did not lead with the hardest pure-reasoning benchmarks. When you look at those, the story changes:

Benchmark	3.5 Flash	3.1 Pro	Winner
Humanity's Last Exam (hard reasoning)	40.2%	44.4%	3.1 Pro +4.2
ARC-AGI-2 (abstract reasoning)	72.1%	77.1%	3.1 Pro +5.0

So the accurate headline is not "Flash beats Pro at everything." It is: a Flash-tier model now beats last generation's flagship at *the work most people actually deploy agents for*, coding, tool use, multimodal understanding, economically valuable tasks, while still trailing it on the hardest abstract-reasoning tests.

That nuance is not a knock on Gemini 3.5 Flash. It is a more useful truth than the marketing line. If your workload is agentic coding or document processing, 3.5 Flash is a genuine upgrade over 3.1 Pro. If your workload is frontier-grade abstract reasoning, last year's Pro is still ahead, and so are this year's actual flagships.

Get the Weekly IT + AI Roundup

What changed this week in NinjaOne, ServiceNow, CrowdStrike, and AI. One email, every Monday.

No spam, unsubscribe anytime. Privacy Policy

Where It Actually Ranks

On the Artificial Analysis Intelligence Index, the most-cited independent composite "how smart is it" score, Gemini 3.5 Flash lands at 55. That is up 9 points from Gemini 3 Flash, and good for roughly 7th of about 147 models tracked. But it is not the top:

Model	AA Intelligence Index v4
GPT-5.5 (xhigh)	60
Claude Opus 4.7 (max effort)	57
Gemini 3.1 Pro	~57
Gemini 3.5 Flash	55
Grok 4.3 (high)	53

So Gemini 3.5 Flash is flagship-*adjacent*, not flagship-beating. GPT-5.5 and Claude Opus 4.7 are still smarter on the composite. The real story is the *combination*: 3.5 Flash lands within 2 to 5 points of the flagships while running several times faster and costing a fraction as much. Artificial Analysis describes it as sitting on the "speed-intelligence Pareto frontier", you cannot get more intelligence at that speed, and you cannot get more speed at that intelligence.

When Google said the model "tops the right quadrant" of the Artificial Analysis index, that is the intelligence-vs-speed chart they mean. High intelligence, very high speed.

The Price Story: "Flash" Does Not Mean "Cheap" Anymore

Google's announcement said Gemini 3.5 Flash costs "less than half the cost of other frontier models." That holds up, against US flagships:

Model	Input / 1M	Output / 1M
Gemini 3.5 Flash	$1.50	$9.00
GPT-5.5	$5.00	$30.00
Claude Opus 4.7	$5.00	$25.00
Claude Sonnet 4.6	$3.00	$15.00
Grok 4.3	$1.25	$2.50
Gemini 3 Flash (prior gen)	$0.50	$3.00

Against GPT-5.5 and Claude Opus 4.7, Gemini 3.5 Flash is roughly 65 to 70% cheaper. On a blended basis it runs about one-third the cost of those flagships. That is real, and for high-volume workloads it matters.

But notice the bottom two rows. Gemini 3.5 Flash is 3x the price of the previous Gemini 3 Flash ($1.50/$9.00 vs $0.50/$3.00). And Grok 4.3's output tokens cost $2.50 against Flash's $9.00.

Independent developer Simon Willison flagged this on launch day, noting that "all three of the major AI labs are starting to probe the price tolerance of their API customers." The top-voted Hacker News thread was titled, bluntly, *"It's discouraging to see Google price Gemini 3.5 Flash at 3x the cost of Gemini 3 Flash."* Artificial Analysis found that running its full benchmark suite on 3.5 Flash cost about 5.5x more than on Gemini 3 Flash, because the model both costs more per token *and* burns more reasoning tokens.

The takeaway for buyers: "Flash" used to be the budget tier. It is now a mid-priced tier that happens to be fast. Cheap only when your comparison is a flagship.

Who Should Use Gemini 3.5 Flash

This is a buyer's-guide question, and there is no single right answer. Pick by workload:

●Choose Gemini 3.5 Flash for high-volume agentic and multimodal workloads, coding agents, document and chart processing, tool-use pipelines, where speed and cost matter and you do not need the absolute top of the intelligence index. Its ~280 tokens/sec and 1M context make it strong for long autonomous runs.
●Choose GPT-5.5 when you need the highest raw intelligence on the hardest reasoning tasks.
●Choose Claude Opus 4.7 for the deepest production coding, it still leads SWE-bench.
●Choose Grok 4.3 when output cost is the dominant constraint, its $2.50 output price undercuts everyone.
●Choose Gemini 3 Flash (the old one) if it is still available and your task is simple, it is one-third the price and may be all you need.

The honest summary: no model wins everything. Gemini 3.5 Flash wins the "fast, capable, and not flagship-priced" quadrant, and it wins it convincingly.

What's Next

Gemini 3.5 Pro lands next month. Google has not published specs, but it shares the coding-and-agentic focus of 3.5 Flash. If 3.5 Flash already matches last year's Pro, the question for 3.5 Pro is whether it can take the Artificial Analysis intelligence crown back from GPT-5.5, which would make it the first Google model to do so this cycle.

For now, Gemini 3.5 Flash is exactly what it claims to be, with the asterisks intact: a fast, mid-priced model that does the agentic work of last year's flagship, ships in everything Google makes, and quietly costs three times what the last Flash did. Read the benchmark menu, check the price against your actual alternative, and it is an easy model to recommend for the workloads it is built for.

Sources

Share this article

Find Your Perfect AI Match

Not sure which AI tools are right for you? Take our free 3-minute quiz.

Take the Quiz

Gemini 3.5 Flash: The Fast Model That Beats Last Year's Flagship

In This Article

What Launched

The Benchmarks, and What They Actually Mean

Flash vs last-gen Pro, head to head

The Asterisk: Read the Benchmark Menu

Get the Weekly IT + AI Roundup

Where It Actually Ranks

The Price Story: "Flash" Does Not Mean "Cheap" Anymore

Who Should Use Gemini 3.5 Flash

What's Next

Sources

Read Next

Claude Fable 5: Anthropic's Most Powerful Public Model Lands at #1, Benchmarks and Pricing

Claude Opus 4.8 Is the New #1 AI Model: Benchmarks, Pricing, and What Changed

Claude Opus 4.7 Released: The Benchmarks, Pricing, and What's New

Find Your Perfect AI Match