In late March last year, Openai announced a “small preview” of its AI service, Voice Engine. The company claimed that it could clone a human voice with just 15 seconds of audio. Almost a year later, the tool remains previewed, giving no indication of when it will be released or whether it will launch.
Resisting to broadly deploy services may point to fear of misuse, but it may also reflect efforts to avoid inducing regulatory scrutiny. Openai has historically been accused of prioritizing “shiny products” at the expense of safety and rushing to release rivals to market.
In a statement, an Openai spokesperson told TechCrunch that the company continues to test its voice engine with a limited set of “trusted partners.”
“[We’re] Learn from the method [our partners are] This technology can be used to improve the usefulness and safety of the model,” the spokesman said. “We’re excited to see the different ways it’s being used, from speech therapy to language learning, customer support, video game characters to AI avatars.”
I was pushed back
The voice engine available in Openai’s Text-to-Speech API and the voice engine available in ChatGPT’s audio mode produces natural sound sounds that resemble the original speaker. This tool converts written characters into speeches and is limited only by specific guardrails of content. However, it was affected by delays and shifts in the release window from the start.
As Openai explained in its June 2024 blog post, voice engine models learn to predict the most likely sounds that speakers will create a particular text transcript, taking into account a variety of voices, accents and speaking styles. After this, the model can generate not only the spoken version of text, but also “speaked statements” that reflect the way different types of speakers read the text aloud.
According to a draft blog post seen by TechCrunch, Openai originally intended to bring a Voice Engine called Custom Voice into the API. The plan is to provide up to 100 “trusted developers” access ahead of a broader debut, prioritizing Devs Building apps that provide “social benefits” or demonstrate “innovative and responsible” use of the technology. Openai recorded trademarks and set prices. $15 per million in the “standard” voice and $30 per million in the “HD Quality” voice.
Then, after 11 hours, the company postponed the announcement. Openai will be unveiling its voice engine in a few weeks without a sign-up option. Access to the tool will remain limited to a cohort of around 10 developers that the company began working with in late 2023, Openai said.
“We want to launch a dialogue about the responsible development of synthetic voices and how society can adapt to these new capabilities,” Openai wrote in a blog post on Voice Engine in late March 2024.
Long in the work
According to Openai, Voice Engine has been in work since 2022. The company claims it demonstrated its tools to “highest level, global policymakers” in the summer of 2023 to showcase its potential and risks.
Some partners have access to today’s voice engines, such as the startup Livox, which builds devices that allow people with disabilities to communicate more naturally. CEO Carlos Pereira told TechCrunch that Livox ultimately failed to build a voice engine on its products due to the online requirements of the tool (many of Livox’s customers don’t have the internet).
“The quality of your voice and the possibility of speaking your voice in a variety of languages are unique. Especially for customers, for people with disabilities,” Pereira told TechCrunch via email. “It’s really the most impressive and easy to use [tool to] Create the voice I saw […] I hope Openai develops an offline version soon. ”
Pereira says he has not received guidance from Openai on the launch of Voice Engine and has never seen any indication that the company will begin billing for services. So far, Livox has not had to pay for its use.
In the aforementioned June 2024 post, Openai suggested that one consideration when slowing down the voice engine during last year’s US election cycle was the possibility of abuse. Notified from discussions with stakeholders, Voice Engine has several mitigating safety measures, including a watermark to track the source of generated audio.
Developers must obtain “explicit consent” from the original speaker before using the voice engine, according to Openai, and “clear disclosure” to the audience that the voice is being generated. However, the company has not stated how it is enforcing these policies. Doing so on a large scale can prove to be extremely challenging for companies with Openai resources as well.
In a blog post, Openai implies that it wants to validate its “voice authentication experience” to validate speakers and its “no go” list that prevents the creation of voices that sound too similar to famous people. Both are technically ambitious projects, and mistakes in them are not often reflected in companies accused of safety initiatives.
Effective filtering and identity verification are becoming more and more rapidly in the baseline requirements for responsible voice cloning technology releases. AI Voice Cloning was the third fastest growing scam of 2024, according to one source. As privacy and copyright laws struggle to catch up, fraud and bank security checks are circumvented. Malicious actors use voice cloning to create deepfakes of celebrities and politicians’ burning Cen, and those deepfakes spread like wildfires across social media.
Openai can release a voice engine next week or never. The company has repeatedly said that keeping the range of services small is measuring weight. But one thing is clear. For optical reasons, safety reasons, or both, the limited preview of Voice Engine has become one of the longest in Openai’s history.
Source link