• About Us
  • Contact us
  • Editorial Policy
  • Terms & Conditions
  • Privacy Policy
Saturday, June 10, 2023
SUBSCRIBE
London Daily Post
  • Home
  • UK
  • World
  • Business
  • Politics
  • Finance
  • Tech
  • Entertainment
  • Lifestyle
  • Sports
No Result
View All Result
  • Home
  • UK
  • World
  • Business
  • Politics
  • Finance
  • Tech
  • Entertainment
  • Lifestyle
  • Sports
No Result
View All Result
London Daily Post
No Result
View All Result
ADVERTISEMENT

Google’s new AI can hear a snippet of song—and then keep on playing

Editorial Board by Editorial Board
October 7, 2022
in Tech News
Reading Time: 3 mins read
0


A new AI system can create natural-sounding voices and music after receiving a request with a few seconds of audio.

AudioLM, developed by Google researchers, generates audio that matches the style of the message, including complex sounds like piano music or people talking, in a way that is almost indistinguishable from the original recording. The technique holds promise for speeding up the process of training AI to generate audio, and could eventually be used to automatically generate music to accompany videos.

(You can listen to all the examples here.)

AI-generated audio is common: Voices from home assistants like Alexa use natural language processing. AI music systems like OpenAI’s Jukebox have already generated impressive results, but most existing techniques require people to prepare transcriptions and label text-based training data, which is time-consuming and human-intensive . Jukebox, for example, uses text-based data to generate song lyrics.

AudioLM, described in a non-peer-reviewed paper last month, is different: it requires no transcription or tagging. Instead, sound databases are fed into the program, and machine learning is used to compress the audio files into chunks of sound, called “tokens,” without losing too much information. This tokenized training data is fed into a machine learning model that uses natural language processing to learn sound patterns.

To generate the audio, a few seconds of sound are fed to AudioLM, which then predicts what comes next. The process is similar to how language models such as GPT-3 predict which phrases and words tend to occur.

The audio clips released by the team sound quite natural. In particular, piano music generated with AudioLM sounds smoother than piano music generated using existing AI techniques, which tends to sound chaotic.

Roger Dannenberg, who researches computer-generated music at Carnegie Mellon University, says AudioLM already has much better sound quality than previous music generation programs. In particular, he says, AudioLM is surprisingly good at recreating some of the repetitive patterns inherent in human-made music. To generate realistic piano music, AudioLM must capture many of the subtle vibrations contained in each note when the piano keys are played. Music must also maintain its rhythms and harmonies over a period of time.

“This is really impressive, in part because it indicates that they are learning some kind of structure at multiple levels,” says Dannenberg.

AudioLM isn’t just limited to music. Because it was trained on a library of recordings of humans speaking sentences, the system can also generate speech that continues with the accent and cadence of the original speaker, although at this point those sentences may still sound like non sequiturs that they don’t make any sense. AudioLM is trained to learn which types of sound fragments occur frequently together and uses the process in reverse to produce sentences. It also has the advantage of being able to learn the pauses and exclamations that are inherent in spoken languages ​​but do not translate easily to text.

Rupal Patel, who researches information and speech science at Northeastern University, says previous work with AI to generate audio could only capture these nuances if they were explicitly noted in the training data. Instead, AudioLM learns these features from the input data automatically, which increases the realistic effect.

“There’s a lot of what we might call linguistic information that’s not in the words you say, but it’s another way of communicating based on the way you say things to express a specific intention or a specific emotion,” says Neil Zeghidour , a co-creator of AudioLM. For example, someone might laugh after saying something to indicate that it was a joke. “All of this makes the speech natural,” he says.

Finally, AI-generated music could be used to provide more natural background soundtracks for videos and slideshows. More natural-sounding speech-generation technology could help improve Internet accessibility tools and robots that work in healthcare settings, Patel says. The team also hopes to create more sophisticated sounds, such as a band with different instruments or sounds that mimic a rainforest recording.

However, the ethical implications of the technology must be considered, says Patel. In particular, it is important to determine whether the musicians who produce the clips used as training data will receive attribution or copyright of the final product, an issue that has arisen with text-to-image AI. AI-generated speech that is indistinguishable from reality could also become so convincing that it allows disinformation to spread more easily.

In the paper, the researchers write that they are already considering and working to mitigate these issues, for example by developing techniques to distinguish natural sounds from sounds produced with AudioLM. Patel also suggested including audio watermarks in AI-generated products to make them easier to distinguish from natural audio.



Source link

Share this:

  • Twitter
  • Facebook

Related Posts

Tech News

The Download: a promising new fuel, and why our phones struggle with wildfires

June 9, 2023

New York-based startup Amogy believes the key to solving this problem lies in harnessing ammonia, one of the world's most...

Tech News

Apple’s headset challenges, and what AI can learn from nuclear safety

June 6, 2023

The "one more thing" announced by Apple at its Worldwide Developers Conference (WWDC) this year was the industry's worst-kept secret....

Tech News

AI films, and the threat of microplastics

June 2, 2023

The Frost nails its strange and disconcerting atmosphere in its opening shots. Huge frozen mountains, a makeshift camp of military-style...

Tech News

Meet the longevity obsessives, and how China’s regulating AI

May 31, 2023

—Jessica Hamzelou Earlier this month, I traveled to Montenegro for a meeting of longevity enthusiasts, people interested in extending human...

Next Post

Is a covid and flu "twindemic" on the horizon?

POPULAR

Entertainment

Elliot Page Recalls ‘Inception’ Cast ‘Full Of Cis Men’ Making Him So Anxious He Got Shingles

June 9, 2023
Finance News

A new bull market in stocks? Thank the VIX, says Fundstrat’s Tom Lee

June 9, 2023
World News

Ukraine presses counteroffensive as flood evacuations continue in south

June 9, 2023
  • About Us
  • Contact us
  • Editorial Policy
  • Terms & Conditions
  • Privacy Policy

© 2022 London Daily Post. All Rights Reserved.

No Result
View All Result
  • Home
  • UK
  • World
  • Business
  • Politics
  • Finance
  • Tech
  • Entertainment
  • Lifestyle
  • Sports