OpenAI rolls out ChatGPT’s hyperrealistic voice to some paying users

OpenAI has launched the alpha version of ChatGPT's Advanced Voice Mode, allowing select Plus users to experience hyperrealistic audio responses from GPT-4o. This feature, which can detect emotional intonations, will gradually roll out to all Plus users by fall 2024.

OpenAI Announcement

OpenAI began the release of ChatGPT’s Advanced Voice Mode on Tuesday, giving their users their first access to GPT-4o’s hyperrealistic audio responses. The alpha version will be available to a small circle of ChatGPT Plus users today, and OpenAI says the feature will gradually roll out to all Plus users in the fall of 2024.

OpenAI first showcased GPT-4o’s voice in May, and in that occasion the feature shocked audiences with quick responses and an uncanny resemblance to a real human’s voice – one in particular. The voice, Sky, which resembled that of Scarlett Johansson, the actress behind the artificial assistant in the movie “Her.” Soon after OpenAI’s demo, the actress said she refused multiple inquiries from OpenAI's CEO Sam Altman to use her voice, and after seeing GPT-4o’s demo, she hired legal counsel to defend her likeness. OpenAI denied the usage of Johansson’s voice, but later removed the voice shown in its demo. Later in June, OpenAI announced that this feature would be delayed to improve its safety measures.

After One month later, the wait is over. OpenAI announced that the video and screensharing capabilities showcased during its Spring Update will not be part of this alpha, and will be launching at a “later date.” For now, the GPT-4o demo that blew everyone away is still just a demo, but some premium users will now have access to ChatGPT’s voice feature shown there.

OpenAI

As other users, you may have already tried out the Voice Mode currently available in ChatGPT, but OpenAI affirmed that Advanced Voice Mode is different. ChatGPT’s old solution to audio used three separate models: one to convert your voice to text, GPT-4 to process your prompt, and then a third to convert ChatGPT’s text into voice. But GPT-4o is multimodal, capable of processing these tasks without the help of auxiliary models, creating significantly lower latency conversations. OpenAI also claimed that GPT-4o can sense emotional intonations in your voice, including sadness, excitement or singing.

ChatGPT figure

Some ChatGPT Plus users will get to see first hand how hyperrealistic OpenAI’s Advanced Voice Mode really is. OpenAI said they're releasing ChatGPT’s new voice gradually to closely monitor its usage. People in the "alpha" group will get an alert in the ChatGPT app, followed by an email with instructions on how to use it.

In the time since OpenAI’s demo, the company said it tested GPT-4o’s voice capabilities with more than 100+ external red teamers who spoke 45+ different languages and a report on these safety efforts is coming in early August.

Your AI Assistant ChatGPT can now talk and listen

OpenAI said Advanced Voice Mode will be release in four preset voices – Juniper, Breeze, Cove and Ember – made in collaboration with paid voice actors. The Sky voice shown in OpenAI’s May demo is no longer available in ChatGPT. OpenAI spokesperson Lindsay McCallum said “ChatGPT cannot impersonate other people’s voices, both individuals and public figures, and will block outputs that differ from one of these preset voices.”

OpenAI also said they have introduced new filters to block certain requests to generate music or other copyrighted audios.

MindAI x Draco

If you don't have a ChatGPT Plus you can start using Draco for FREE or subscribe to a premium plan and use GPT-4o, Gemini and Anthropic models with no limits.