Google DeepMind Unveils Gemini 3 With Sub-Second Long-Context Reasoning Across 10m Tokens

Google DeepMind has formally launched Gemini 3, the latest generation of the company's flagship foundation-model family, with the headline performance claim of sub-second long-context reasoning across a 10-million-token context window — an architectural step-change that materiall…

Tom Whitmore

Published

May 9, 2026

Read

2 min

Google DeepMind Unveils Gemini 3 With Sub-Second Long-Context Reasoning Across 10m Tokens

The 10-million-token context window — equivalent to roughly 7,500 pages of dense business document — substantially extends what was previously possible across the proprietary frontier-model landscape. The more important architectural advance is the latency profile across that context window: what previously required tens of seconds for full-context reasoning has now been compressed to under a second on the production Gemini 3 stack, with the latency improvement coming from a combination of architectural changes in the attention mechanism, custom inference-stack optimisations on Google's TPU v6 hardware, and a substantially-improved retrieval-and-attention coordination layer.

The model family release covers four variants: Gemini 3 Pro (the flagship reasoning model), Gemini 3 Flash (a latency-optimised variant for agentic workloads), Gemini 3 Code (specialised for software-engineering tasks), and a multimodal Gemini 3 Vision variant that substantially extends the previous-generation visual-and-video reasoning capability. The benchmark performance places Gemini 3 Pro at the top of essentially every major industry-standard reasoning evaluation, with particularly notable leads on the practical-software-engineering and the scientific-reasoning evaluation suites.

The deployment infrastructure has been simultaneously upgraded. Google Cloud's Vertex AI platform now offers Gemini 3 Pro inference at $1.20 per million input tokens and $4.80 per million output tokens — pricing that meaningfully undercuts the equivalent OpenAI GPT-5 offering and that signals Google's continuing willingness to use its hyperscaler-scale TPU infrastructure to compete aggressively on the per-token economics of the inference market.

For the wider AI-applications landscape, the Gemini 3 release substantially shifts the practical envelope for enterprise deployment. Use cases that were previously gated by context-window or latency limitations — comprehensive financial-document analysis, full-codebase software-engineering tasks, longitudinal patient-record reasoning in healthcare, multi-document legal-discovery automation — are now operationally feasible at production-grade latency for the first time. The competitive response from OpenAI, Anthropic, and the broader frontier-model landscape will be the principal forward variable across the rest of the calendar year.

Written by

Tom Whitmore

Senior correspondent · Technology & Energy

Tom trained as an electrical engineer, which makes him unusually patient with infrastructure stories. He reports on AI, cloud, the energy transition, and the businesses turning frontier engineering into real cash flow. Previously he covered the chip supply chain from Taipei. Skeptical of slide decks; comfortable in a substation. Based in Singapore. Reach out at tom.whitmore@theplatinumcapital.com.

Google DeepMind Unveils Gemini 3 With Sub-Second Long-Context Reasoning Across 10m Tokens

Tom Whitmore

Related Reads

Figure Confirms Commercial Deployment Of Figure 03 At BMW Plant As Humanoid Robotics Cycle Accelerates

Insilico Medicine's INS018 Enters Phase Three As First AI-Discovered IPF Drug Hits Late-Stage Trial

EU AI Act Phase Two Implementation Confirms Foundation-Model Compliance Deadlines For Frontier Labs