Nvidia Reportedly Developing New AI Inference Chip for OpenAI, Shifting Strategy

Nvidia Reportedly Developing New AI Inference Chip for OpenAI, Marking Strategic Shift

Nvidia is said to be working on a new processor specifically designed for AI inference computing, a type of processing that allows AI models to respond to user queries, with OpenAI reportedly set to be one of its largest customers. According to sources, the announcement is expected at Nvidia’s GTC developer conference in San Jose next month, highlighting a significant move in the tech giant's business strategy.

What Is the New Chip and How Is It Different?

According to a report by The Wall Street Journal, this development marks one of the most significant shifts in Nvidia’s business strategy since the start of the AI boom. Nvidia has long dominated the market for GPUs – specialized chips used for training AI models, such as its Hopper, Blackwell, and Rubin series, with analysts estimating it controls over 90% of the GPU market. However, GPUs were primarily designed with training in mind, and as the AI industry shifts from building models to actually running them, their efficiency for inference tasks is limited.

The new processor is designed around inference computing rather than training. Nvidia will incorporate technology from Groq, a chip startup it acquired in a roughly $20 billion deal late last year. Groq's chips use a different architecture from Nvidia's GPUs – known as language processing units or LPUs – which are built to handle inference tasks with significantly greater efficiency. These chips essentially compete with Google’s TPUs, which have become highly in-demand for similar purposes.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Why Does Inference Computing Matter?

AI inference is the process by which a trained model actually responds to a question or completes a task. It involves two main steps: pre-fill, where the model interprets the user's prompt, and decode, where it generates a response one word at a time. Many companies building and running AI agents have found that Nvidia’s GPUs are not only too expensive but consume too much energy and are not ideally suited to inference workloads, especially as agentic AI has grown exponentially.

This has exposed a gap in Nvidia's product lineup that rivals have been quick to try and fill. Google and Amazon have both developed their own inference-focused chips, and last month OpenAI signed a multibillion-dollar computing partnership with Cerebras, whose CEO claims its inference chip outperforms Nvidia's GPUs on speed. The report comes as Nvidia became part of a broader $110 billion funding round, which includes a $30 billion investment from Nvidia itself, underscoring the competitive landscape in AI hardware.

Note: This article focuses on Nvidia's technological developments and does not cover unrelated geopolitical events mentioned in the original text.