
OpenAI has announced a new set of voice intelligence capabilities for its API, aimed at helping developers build applications that can speak, transcribe, and translate conversations in real time. The latest update is designed to improve how AI-powered apps interact with users through voice-based communication.
As part of the launch, OpenAI introduced three new audio models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. GPT-Realtime-2 is built to support more natural voice conversations and comes with GPT-5-level reasoning abilities, allowing it to handle more advanced requests and maintain context during longer interactions.
The GPT-Realtime-Translate model focuses on live multilingual communication. According to reports, it can translate conversations from more than 70 input languages into 13 output languages, making it useful for industries such as customer support, education, and global communication services.
Meanwhile, GPT-Realtime-Whisper has been developed for real-time speech-to-text transcription. The model is designed to provide fast and accurate live transcriptions for use cases such as meeting captions, workflow documentation, and conversational AI tools.
OpenAI stated that the new voice models are part of its broader Realtime API initiative, which enables developers to create more conversational and responsive AI agents. Companies including Zillow, Priceline, and Deutsche Telekom are already testing the technology to develop voice assistants capable of handling live conversations and tasks simultaneously.
The company also noted that safety measures have been added to reduce misuse of the technology, including protections against spam, fraud, and harmful interactions. OpenAI said the system can pause conversations if it detects violations related to harmful content or abuse.




