OpenAI had originally planned to offer the new features to some paying subscribers in late June but is postponing that initial release by a month, the company said in a statement on X. The features will be available to all paying users in the fall, the company said, though it added the caveat that “exact timelines depend on meeting our high safety and reliability bar.”
OpenAI first added the ability for ChatGPT to speak in a one of several synthetic voices, or “personas,” late last year. The demo in May used one of those voices to show off a newer, more capable AI system called GPT-4o that saw the chatbot speak in expressive tones, respond to a person’s tone of voice and facial expressions, and have more complex conversations. One of the voices, which OpenAI called Sky, resembles the voice of an AI bot played by Johansson in the 2013 movie “Her,” about a lonely man who falls in love with his AI assistant.
OpenAI CEO Sam Altman has denied that the company trained the bot on Johansson’s voice. The Washington Post reported last month that the company hired a different actor to provide training audio, according to internal records and interviews casting directors and the actor’s agent.
GET CAUGHT UP
Stories to keep you informed
As the world’s largest tech companies and upstarts like OpenAI race to compete in generative AI, several projects have run into unexpected hurdles. Last month, Google scaled back how often it shows AI-generated answers on top of search results, after the tool made strange errors such as telling people to put glue on their pizza. In February, the search company withdrew an AI image generator criticized for creating images such as a female pope. Microsoft last year made changes to its own AI chatbot after it sometimes provided bizarre and aggressive answers.
OpenAI said Tuesday that it needed extra time to make the new voice version of its chatbot better at noticing and blocking some content, without disclosing specifics. Many AI tools have been criticized for making up false information, spouting racist or sexist content, or displaying biases in their outputs. Designing a chatbot that attempts to interpret and mimic emotions increases the complexity of its interactions, opening new ways for things to go wrong.
“ChatGPT’s advanced Voice Mode can understand and respond with emotions and nonverbal cues, moving us closer to real-time, natural conversations with AI,” OpenAI said in its statement. “Our mission is to bring these new experiences to you thoughtfully.”