OpenAI releases ChatGPT’s hyperrealistic voice to some paying users

OpenAI began rolling out ChatGPT’s Advanced Voice Mode on Tuesday, giving users their first access to GPT-4o’s hyperrealistic audio responses. The alpha version will be available to a small group of ChatGPT Plus users today, and OpenAI says the feature will gradually roll out to all Plus users in the fall of 2024.

When OpenAI first showcased GPT-4o’s voice in May, the feature shocked audiences with quick responses and an uncanny resemblance to a real human’s voice – one in particular. The voice, Sky, resembled that of Scarlett Johansson, the actress behind the artificial assistant in the movie “Her.” Soon after OpenAI’s demo, Johansson said she refused multiple inquiries from CEO Sam Altman to use her voice, and after seeing GPT-4o’s demo, hired legal counsel to defend her likeness. OpenAI denied using Johansson’s voice, but later removed the voice shown in its demo. In June, OpenAI said it would delay the release of Advanced Voice Mode to improve its safety measures.

One month later, and the wait is over (sort of). OpenAI says the video and screensharing capabilities showcased during its Spring Update will not be part of this alpha, launching at a “later date.” For now, the GPT-4o demo that blew everyone away is still just a demo, but some premium users will now have access to ChatGPT’s voice feature shown there.

ChatGPT can now talk and listen

You may have already tried out the Voice Mode currently available in ChatGPT, but OpenAI says Advanced Voice Mode is different. ChatGPT’s old solution to audio used three separate models: one to convert your voice to text, GPT-4 to process your prompt, and then a third to convert ChatGPT’s text into voice. But GPT-4o is multimodal, capable of processing these tasks without the help of auxiliary models, creating significantly lower latency conversations. OpenAI also claims GPT-4o can sense emotional intonations in your voice, including sadness, excitement or singing.

In this pilot, ChatGPT Plus users will get to see first hand how hyperrealistic OpenAI’s Advanced Voice Mode really is. TechCrunch was unable to test the feature before publishing this article, but we will review it when we get access.

OpenAI says it’s releasing ChatGPT’s new voice gradually to closely monitor its usage. People in the alpha group will get an alert in the ChatGPT app, followed by an email with instructions on how to use it.

In the months since OpenAI’s demo, the company says it tested GPT-4o’s voice capabilities with more than 100 external red teamers who speak 45 different languages. OpenAI says a report on these safety efforts is coming in early August.

The company says Advanced Voice Mode will be limited to ChatGPT’s four preset voices – Juniper, Breeze, Cove and Ember – made in collaboration with paid voice actors. The Sky voice shown in OpenAI’s May demo is no longer available in ChatGPT. OpenAI spokesperson Lindsay McCallum says “ChatGPT cannot impersonate other people’s voices, both individuals and public figures, and will block outputs that differ from one of these preset voices.”

OpenAI is trying to avoid deepfake controversies. In January, AI startup ElevenLabs’s voice cloning technology was used to impersonate President Biden, deceiving primary voters in New Hampshire.

OpenAI also says it introduced new filters to block certain requests to generate music or other copyrighted audio. In the last year, AI companies have landed themselves in legal trouble for copyright infringement, and audio models like GPT-4o unleash a whole new category of companies that can file a complaint. Particularly, record labels, who have a history for being litigious, and have already sued AI song-generators Suno and Udio.