ElevenLabs’s new open-source tool can add sound effects to any video

3 Min Read

It is time to rejoice the unimaginable girls main the way in which in AI! Nominate your inspiring leaders for VentureBeat’s Girls in AI Awards at present earlier than June 18. Study Extra


Weeks after AI voice startup ElevenLabs launched its Sound Results text-to-sound AI providing, the corporate is releasing an open-source tool to showcase its potential. In “about 15 seconds,” this software allows creators to generate sound impact samples for his or her movies, analyzing the imported clip and offering a number of choices.

Whereas builders can entry the app’s code on GitHub, ElevenLabs has printed a web site for the general public to check out its Sound Results API.

While you add a video, the so-called Video to Sound Results app extracts 4 frames at one-second intervals on the shopper facet. Then, it sends these frames and a immediate to OpenAI’s GPT-4o to create a customized text-to-sound results immediate. That immediate is then used to generate a sound impact via ElevenLabs’s Sound Results API. Lastly, the video and audio are mixed on the shopper facet right into a single file prepared for obtain that may be as much as 22 seconds lengthy.

“We view it as a proof of idea of what individuals will be capable to do with our SFX API,” Ammaar Reshi, ElevenLabs’ design lead, tells VentureBeat. “AI video creators are sometimes trying to find the proper sound impact and we felt like we may velocity up the workflow intelligently by understanding the frames of their movies after which suggesting the perfect output.” He says the corporate is worked up in regards to the completely different sorts of dynamic experiences individuals would possibly construct with this API, highlighting immersive video video games as one instance the place sounds could also be generated primarily based on a participant’s interplay.

See also  Microsoft lays off AI ethics team

The aforementioned API permits builders to construct absolutely customized AI sound results utilizing a brief description. ElevenLabs fees 100 characters per technology with an computerized period or 25 characters per second with a set period.

In a quick take a look at, the video-to-sound results app appeared easy. After importing an audio-free film of a automobile navigating an all-terrain surroundings, ElevenLabs’ AI generated 4 choices, all sounding like a automotive traversing on a gravel highway. However whereas it’s amusing to use sound results to clips, maybe the actual potential is for this functionality to be built-in into a bigger system to derive the actual advantages.

And because the AI video technology house heats up, ElevenLabs is perhaps seeking to keep forward of everybody, growing new audio options it is aware of might be in demand by builders, filmmakers and creators.


Source link
Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter CoinGecko Free Api Key to get this plugin works.