OpenAI Launches New Voice AI Models for Natural Conversations
OpenAI Unveils New Voice AI Models for Real-Time Talk

OpenAI, the creator of ChatGPT, has introduced a new suite of voice intelligence models in its API, designed to make AI-powered voice interactions more natural, responsive, and capable of handling complex tasks. The lineup includes GPT-Realtime-2, a live voice model with GPT-5-class reasoning; GPT-Realtime-Translate, which enables real-time multilingual conversations; and GPT-Realtime-Whisper, a streaming speech-to-text model for instant transcription.

Voice as a Natural Interface

Voice is becoming one of the most natural ways for people to use software. A voice agent needs to understand what someone means, keep track of context, recover when a request changes, use tools while the conversation continues, and respond in a way that feels appropriate to the moment. Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds, said the company.

Key Features of the New Models

The new models expand voice AI beyond simple call-and-response. Key capabilities include:

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list
  • Voice-to-action: Users can describe tasks, and the system reasons through requests to complete them.
  • Systems-to-voice: Software can proactively provide spoken guidance, such as travel apps updating passengers about delays.
  • Voice-to-voice: Real-time translation allows seamless multilingual conversations, maintaining context across languages.

These features are already being tested by companies like Zillow, Deutsche Telekom, and Vimeo, highlighting their potential in customer support and global communication. OpenAI reports that GPT-Realtime-2 delivers stronger reasoning, scoring 15.2% higher on Big Bench Audio and 13.8% higher on Audio MultiChallenge benchmarks compared to earlier versions. The model supports a 128K context window, enabling longer, coherent conversations. Developers can also adjust reasoning effort levels to balance latency and complexity, while enhanced tone control allows empathetic, calm, or upbeat delivery depending on context.

Responsible Use and Security

To ensure responsible use, OpenAI has integrated active classifiers that detect harmful content and halt sessions when necessary. Developers can add custom guardrails via the Agents SDK. The API also supports EU data residency and enterprise-grade privacy commitments, making it suitable for regulated industries such as finance and healthcare.

Pricing and Availability

The new models are available now in the Realtime API. Pricing is as follows:

Pickt after-article banner — collaborative shopping lists app with family illustration
  • GPT-Realtime-2: $32 per 1M audio input tokens, $64 per 1M audio output tokens.
  • GPT-Realtime-Translate: $0.034 per minute.
  • GPT-Realtime-Whisper: $0.017 per minute.