When OpenAI launched ChatGPT in November 2022, the world believed Google had been caught off guard by the AI revolution. But behind the scenes, Google had been quietly building custom chips since 2016 that may now change everything.
Introduction: The Moment Everyone Wrote Google Off
November 2022 was a humbling month for Google. OpenAI's ChatGPT stunned the world with its conversational abilities, leading many to declare that Google—the company that invented modern internet search and called itself AI-first—had been caught sleeping. Ironically, Google had published the seminal 2017 paper on Transformer architecture that made ChatGPT possible. Yet a startup backed by Microsoft's $1 billion shipped first. Inside Google, a "Code Red" was declared. Co-founders Larry Page and Sergey Brin returned, and Bard was fast-tracked. From outside, it looked like panic.
But CEO Sundar Pichai later revealed a different story. In an interview, he said: "It was obviously very invert-focused in that moment. To me, it was very clear, 'Hey, the Overton window shifted.' I felt like the company was built for that moment... We were in the seventh version of TPUs. I remember it might have been 2016 Google I/O where we announced the TPUs and spoke about building AI data centers. This was 2016. The company was operating in an AI-first way."
Part I: The Chip Game
While the tech world rushed to buy Nvidia's GPUs, Google played a different game. Since 2016, it developed Tensor Processing Units (TPUs)—chips purpose-built for neural network operations. Unlike Nvidia's general-purpose GPUs, TPUs are designed from the ground up for AI workloads. By the time ChatGPT launched, Google was already on its seventh generation of TPUs, with years of iterative refinement that no amount of money could buy overnight. This meant that when Gemini models arrived, they ran on infrastructure Google owned and had perfected for nearly a decade.
Part II: The Inference Problem
The AI industry focused on training large models, but inference—running models in real time for users—proved equally challenging. Inference requires speed, efficiency, and scale. Google's TPUs excelled at inference due to their specialized architecture for matrix multiplication. This allowed Google's models to deliver fast responses on its products. Moreover, Google opened TPU access via Google Cloud, turning its internal advantage into a commercial business.
Part III: The Day Google Scared Nvidia
When Meta signed a deal to use Google TPUs for certain workloads, Nvidia's stock dropped billions in a single day. Nvidia responded by touting its versatility: "NVIDIA is a generation ahead of the industry—it's the only platform that runs every AI model and does it everywhere." It also acquired Groq, an inference-focused chip startup, signaling the competitive threat. The message was clear: Google's TPUs were no longer an internal experiment but a serious rival.
Part IV: Google Strikes Back with TPU v8
Google announced TPU v8 in two configurations: TPU 8t for massive-scale training and TPU 8i for high-performance, low-latency agentic inference. These chips address both sides of AI compute: training frontier models and running autonomous AI agents. Together, they represent Google's vision as a full-stack AI company—owning research, models, chips, data centers, cloud, and consumer products. Pichai's "Overton window" comment captures this: Google wasn't behind; it was waiting for the world to be ready. Now, its Gemini models are competitive, its TPU infrastructure attracts Nvidia's customers, and its cloud business grows. The chips built in secret for years are now at the center of the AI era's most profound competition.



