Google DeepMind Unveils Gemini 3 With Sub-Second Long-Context Reasoning Across 10m Tokens
Google DeepMind has formally launched Gemini 3, the latest generation of the company's flagship foundation-model family, with the headline performance claim of sub-second long-context reasoning across a 10-million-token context window โ an architectural step-change that materiallโฆ

By
Tom Whitmore
Published
May 9, 2026
Read
2 min

Google DeepMind has formally launched Gemini 3, the latest generation of the company's flagship foundation-model family, with the headline performance claim of sub-second long-context reasoning across a 10-million-token context window โ an architectural step-change that materially shifts the practical use-case envelope for a significant share of enterprise AI workloads.
The 10-million-token context window โ equivalent to roughly 7,500 pages of dense business document โ substantially extends what was previously possible across the proprietary frontier-model landscape. The more important architectural advance is the latency profile across that context window: what previously required tens of seconds for full-context reasoning has now been compressed to under a second on the production Gemini 3 stack, with the latency improvement coming from a combination of architectural changes in the attention mechanism, custom inference-stack optimisations on Google's TPU v6 hardware, and a substantially-improved retrieval-and-attention coordination layer.
The model family release covers four variants: Gemini 3 Pro (the flagship reasoning model), Gemini 3 Flash (a latency-optimised variant for agentic workloads), Gemini 3 Code (specialised for software-engineering tasks), and a multimodal Gemini 3 Vision variant that substantially extends the previous-generation visual-and-video reasoning capability. The benchmark performance places Gemini 3 Pro at the top of essentially every major industry-standard reasoning evaluation, with particularly notable leads on the practical-software-engineering and the scientific-reasoning evaluation suites.
The deployment infrastructure has been simultaneously upgraded. Google Cloud's Vertex AI platform now offers Gemini 3 Pro inference at $1.20 per million input tokens and $4.80 per million output tokens โ pricing that meaningfully undercuts the equivalent OpenAI GPT-5 offering and that signals Google's continuing willingness to use its hyperscaler-scale TPU infrastructure to compete aggressively on the per-token economics of the inference market.
For the wider AI-applications landscape, the Gemini 3 release substantially shifts the practical envelope for enterprise deployment. Use cases that were previously gated by context-window or latency limitations โ comprehensive financial-document analysis, full-codebase software-engineering tasks, longitudinal patient-record reasoning in healthcare, multi-document legal-discovery automation โ are now operationally feasible at production-grade latency for the first time. The competitive response from OpenAI, Anthropic, and the broader frontier-model landscape will be the principal forward variable across the rest of the calendar year.

Written by
Tom Whitmore
Senior correspondent ยท Technology & Energy
Tom trained as an electrical engineer, which makes him unusually patient with infrastructure stories. He reports on AI, cloud, the energy transition, and the businesses turning frontier engineering into real cash flow. Previously he covered the chip supply chain from Taipei. Skeptical of slide decks; comfortable in a substation. Based in Singapore. Reach out at tom.whitmore@theplatinumcapital.com.




