When the generative artificial intelligence startup OpenAI released a demo of its new ChatGPT 4o model last week, it included extensive video of its “Voice Mode,” which features an emotive voice answering user questions.
While there are a number of voices available, viewers noticed that one of them, “Sky,” sounded suspiciously like actress Scarlett Johansson, who portrayed the voice of an emotive AI in the 2013 film Her (in fact, OpenAI founder Sam Altman posted “her” on X during the demo).
Now, OpenAI says that it is “pausing” the use of the Sky voice as it seeks to address the concerns from users about such a familiar voice being used.
“We’ve heard questions about how we chose the voices in ChatGPT, especially Sky,” the company posted Monday morning. “We are working to pause the use of Sky while we address them.”
In a statement of her own released Monday evening, Johansson says that Altman approached her last September and asked her to consider being one of the voices for ChatGPT.
“He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and AI. He said he felt that my voice would be comforting to people,” Johansson shared in a statement. “After much consideration and for personal reasons, I declined the offer. Nine months later, my friends, family and the general public all noted how much the newest system named ‘Sky’ sounded like me.”
She continued that two days before the ChatGPT4o demo, Altman contacted her agent, asking her to reconsider.
Johansson added that due to the release of the “Sky” voice, her lawyers subsequently sent letters to OpenAI.
“As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAI, setting out what they had done and asking them to detail the exact process by which they created the ‘Sky’ voice,” Johansson said. “Consequently, OpenAI reluctantly agreed to take down the “Sky” voice.”
In a blog post, the company acknowledged the concerns, and explained its process for creating the voices, noting that it ran an extensive casting process
“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice — Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,” the blog post said. “To protect their privacy, we cannot share the names of our voice talents.”
OpenAI says that it began working with “well-known, award-winning” casting directors and producers in early 2023 to identify different voice actors that could become the voices in the product, and received over 400 submissions. That list was whittled down to 14.
“We spoke with each actor about the vision for human-AI voice interactions and OpenAI, and discussed the technology’s capabilities, limitations, and the risks involved, as well as the safeguards we have implemented. It was important to us that each actor understood the scope and intentions of Voice Mode before committing to the project,” the blog post continued, adding that they would eventually settle on the five final voices.
Those actors flew to San Francisco, where the company led recording sessions, before releasing the voices into ChatGPT last fall.
The tech company says that it will add new voices to the platform over time.
“We support the creative community and worked closely with the voice acting industry to ensure we took the right steps to cast ChatGPT’s voices,” it said in the blog post. “Each actor receives compensation above top-of-market rates, and this will continue for as long as their voices are used in our products.”
Read Johansson’s full statement, below.
“Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4.0 system. He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and AI. He said he felt that my voice would be comforting to people.
After much consideration and for personal reasons, I declined the offer. Nine months later, my friends, family and the general public all noted how much the newest system named “Sky” sounded like me.
When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. Mr. Altman even insinuated that the similarity was intentional, tweeting a single word “her” – a reference to the film in which I voiced a chat system, Samantha, who forms an intimate relationship with a human.
Two days before the ChatGPT 4.0 demo was released, Mr. Altman contacted my agent, asking me to reconsider. Before we could connect, the system was out there.
As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAI, setting out what they had done and asking them to detail the exact process by which they created the “Sky” voice. Consequently, OpenAI reluctantly agreed to take down the “Sky” voice.
In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity. I look forward to resolution in the form of transparency and the passage of appropriate legislation to help ensure that individual rights are protected. ”