OpenAI unveils voice cloning AI model Voice Engine

6 Min Read

Be part of us in Atlanta on April tenth and discover the panorama of safety workforce. We’ll discover the imaginative and prescient, advantages, and use instances of AI for safety groups. Request an invitation right here.


Not content material to disrupt merely textual content era, imagery, and video with its varied AI fashions, ChatGPT-maker OpenAI can be stepping into the final main type of legacy digital media: audio. Particularly, voice cloning.

The corporate at this time is announcing its newest AI model, “Voice Engine,” which it says has been in growth since 2022 and presently powers OpenAI’s text-to-speech API and the brand new ChatGPT Voice and Read Aloud options unveiled earlier this month.

Because it seems, the mannequin also can preform voice cloning. Right here’s the way it works: a human speaker information a 15-second clip of their voice by means of a cellphone or laptop microphone, and OpenAI’s Voice Engine generates “natural-sounding speech that intently resembles the unique speaker,” and can be utilized henceforth going ahead, to talk aloud any textual content {that a} human consumer sorts in.

Monumental implications for spoken audio market

The tech has clearly big implications for many who document themselves talking typically, be they podcasters, voice over artists, spoken phrase performers, audiobook and promoting narrators, avid gamers, streamers, customer support brokers, salespersons, and lots of different occupations and disciplines.

It additionally places strain on different corporations devoted to the sort of tech, akin to well-funded AI startup ElevenLabs, Captions, Meta, WellSaid Labs, MyShell, and others.

See also  How AI can help close IoT's growing security gaps to contain ransomware

OpenAI additional spotlight’s Voice Engine’s functionality to supply help for non-verbal people, offering them with distinctive, non-robotic voices, and assist in therapeutic and academic applications for these with speech impairments or studying wants.

Preliminary use instances

OpenAI stated in its weblog submit asserting Voice Engine at this time that thus far, it has solely made the tech out there to a “small group of trusted companions.” Amongst these highlighted and named are

  1. Age of Studying, an schooling expertise firm that makes use of Voice Engine and GPT-4 for producing pre-scripted and real-time personalised voice content material, increasing studying help and interactivity for a various scholar viewers.
  2. HeyGen, an AI visible storytelling platform that allows creators and companies to translate their content material into a number of languages, employs Voice Engine for video translation, creating customized human-like avatars with multilingual voices, preserving unique speaker’s accent to achieve a worldwide viewers.
  3. Dimagi, a software program firm making instruments for neighborhood well being staff, makes use of Voice Engine and GPT-4 to offer interactive suggestions in varied languages for stated staff, bettering important service supply in distant settings.
  4. Livox, an AI app for Augmentative and Various Communication (AAC) gadgets utilized by these with speech and listening to difficulties, integrates Voice Engine to offer distinctive, non-robotic voices throughout languages for non-verbal people.
  5. The Norman Prince Neurosciences Institute at Lifespan, a nonprofit medical and educating group at Brown College, devoted to serving to these with neurological ailments and issues, is utilizing Voice Engine to help these with speech impairments in utilizing the AI model of their voice. Two docs there, Rohaid Ali and pediatric neurosurgeon Konstantina Svokos, have already efficiently restored a mind tumor affected person’s speech utilizing an audio pattern from one in all her faculty mission movies.
See also  Meta's Next-Gen Model for Video and Image Segmentation

The corporate uploaded to its weblog, and emailed to VentureBeat beneath embargo, a number of audio samples exhibiting the tech’s humanlike talking capabilities. For instance, right here’s the unique “supply voice” of Lifespan’s affected person:

And right here’s the cloned voice utilizing OpenAI Voice Engine:

Restricted consumer base by design

But for now, the tech is restricted. As with its highly effective, extremely real looking and vivid video era AI mannequin Sora, OpenAI is not presently permitting the general public to make use of Voice Engine. As a substitute, at this time OpenAI is solely sharing the existence of the instrument and “preliminary insights and outcomes from a small-scale preview” with “a small group of trusted companions” who’ve been given entry.

As OpenAI states in its weblog submit at this time asserting the tech:

“We’re taking a cautious and knowledgeable strategy to a broader launch as a result of potential for artificial voice misuse. We hope to begin a dialogue on the accountable deployment of artificial voices and the way society can adapt to those new capabilities. Based mostly on these conversations and the outcomes of those small scale exams, we are going to make a extra knowledgeable choice about whether or not and how you can deploy this expertise at scale.”

The cautious, slow-and-steady, restricted entry strategy to releasing Voice Engine is smart particularly in gentle of U.S. President Joseph R. Biden’s current name to “ban AI voice impersonation.”

Central to OpenAI’s deployment technique is a stringent adherence to security and moral tips. Companions concerned in testing Voice Engine are certain by utilization insurance policies that prohibit unauthorized impersonation and require knowledgeable consent from voice donors.

See also  Sam Altman and Adam D’Angelo reunite for Thanksgiving following OpenAI boardroom drama

Moreover, OpenAI has carried out security measures akin to watermarking and proactive monitoring to make sure the expertise’s accountable use.

Source link

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter CoinGecko Free Api Key to get this plugin works.