Google's Gemini 3 Surpasses ChatGPT in AI Benchmark Tests

In a significant breakthrough for the tech giant, Google's Gemini large language model has surged past its competitors, including ChatGPT, to claim the top spot in consensus industry-benchmark tests. The release of its third version, Gemini 3, this week, represents a pivotal moment for Google, which has been striving to gain an edge in the artificial intelligence arena since ChatGPT's debut three years ago.

A Long-Awaited Victory for Google

For months, Google employees conducted their own informal 'vibe checks' on the developing model, testing it with everything from joke generation to complex math problems. The internal sentiment was one of growing confidence. Tulsee Doshi, Gemini’s senior director of product management, shared a telling example. She asked the model to write in Gujarati, a language widely spoken in India but not heavily represented online. The results were remarkably superior to previous models, a sign of life that convinced the team they had a winner.

This internal optimism was validated by external early testers. Aaron Levie, CEO of Box, who received early access, described a 'double-digit points' leap in performance when analyzing complex documents. The improvement was so substantial that his team initially questioned their evaluation methods. This performance boost has handed Google an elusive victory, positioning it well ahead in the race to develop advanced artificial intelligence for the first time in years.

Benchmark Dominance and Market Impact

The data supporting Gemini 3's supremacy is compelling. The model outperformed rivals on more than a dozen benchmark tests covering categories like expert-level knowledge, logic, math, and image recognition. It secured second place only in a single coding benchmark against Anthropic's Claude Sonnet 4.5. A particularly impressive result was on the Vending Bench evaluation, which tests a model's ability to plan and use tools by simulating the operation of a vending machine.

This technical success has direct market consequences. The launch of Gemini 3 poses a significant challenge to AI startups like OpenAI and Anthropic. While ChatGPT remains the most popular chatbot with 800 million weekly users compared to Gemini's 650 million monthly users, Gemini 3's advanced capabilities could cement it as a preferred tool for diverse tasks. Analyst Michael Nathanson of MoffettNathanson stated, 'They are AI winners, that’s pretty clear. I feel pretty good about their hand right now.'

Driving Growth and Regaining Confidence

Google's journey to this point has been one of strategic overhaul. Since the launch of ChatGPT sparked investor fears about the future of its iconic search engine, the company has been scrambling. Under CEO Sundar Pichai, Google broke down internal silos, streamlined leadership, and consolidated AI development work, with co-founder Sergey Brin returning to a day-to-day role overseeing these efforts.

The turnaround is now evident. The August debut of the Nano Banana image-generation tool powered a massive surge in Gemini usage, with monthly users jumping from 450 million in July to 650 million. This growth contributed to parent company Alphabet reporting record quarterly revenue, driven by cloud computing and advertising. Its shares are up more than 50% this year, and its market value hit $3.6 trillion, surpassing Microsoft for the first time in seven years.

As part of the Gemini 3 rollout, Google has integrated the new model into its AI Mode search feature from day one. Robby Stein, vice president of product for search, experienced its potential firsthand. When he asked AI Mode to help explain airplane lift force to his 7-year-old son, it generated an interactive simulation with moving currents and a sliding wing, far exceeding his expectation for a simple written answer. This 'aha' moment underscores the model's potential to revolutionize how information is delivered and understood.