AIs accurate reproduction of sound is hard to distinguish between true and false Bill Gates

category:Internet
 AIs accurate reproduction of sound is hard to distinguish between true and false Bill Gates


Sean Vasquez and Mike Lewis of the Facebook Artificial Intelligence Research Center said this week that they had been trying to imitate human language for some time. However, imitating human language is obviously a difficult task. When people hear the voice of Stephen Hawkings most famous speech machine, they will find that it still sounds very unlike human beings.

But now researchers seem to have made progress. If you listen to the voice of Gatescloning, I think you will agree. Because it sounds like Bill Gates, you can hardly tell the difference between it and his real voice.

Researchers demonstrated their research. Here, the machine imitates Gatestone and says, Please send a loving message to your dear friend. One of the most incredible things about this machine is that when it says cherish, it accurately captures Gatesrising intonation.

This technology, called MelNet, can be used to replicate human intonation. So far, Gates and many othersvoices have been perfectly reproduced. Vasquez and Lewis said the cloned audio was taken from various Ted speeches.

The two researchers also said that until recently, text-to-speech conversion software did not work well because it used waveforms to record sound. These pictures show the scale changes of sound in a few seconds. If youve heard Gates say the word cherish, you know that his tone changes dramatically. When trying to imitate a person, deep learning machines have to predict all these subtle changes, which is not easy.

Vasquez and Lewis say they have successfully cloned sound by using something called a spectrogram to train the machine.

The time axis of the spectrogram is several orders of magnitude more compact than the time axis of the waveform, which means that the dependence across tens of thousands of time steps in the waveform spans only hundreds of time steps in the spectrogram, the researchers said. This enables our spectral model to record various voice and music samples in seconds and maintain their consistency. However, they also experienced some setbacks. The team says its not difficult for them to copy a sentence almost perfectly. Its hard for them to copy complex intonations that show emotional changes in tens of seconds or minutes. Nevertheless, when it comes to human-computer interaction, the team says that in situations involving only brief conversations, the technology could bring about revolutionary changes. (Selected from: Silicon ANGLE Author: James Farrell: Netease Intelligent Participation: Yili) Source of this article: Netease Intelligent Responsible Editor: Gu Yuxin_NBJS8596

The time axis of the spectrogram is several orders of magnitude more compact than the time axis of the waveform, which means that the dependence across tens of thousands of time steps in the waveform spans only hundreds of time steps in the spectrogram, the researchers said. This enables our spectral model to record various voice and music samples in seconds and maintain their consistency.

However, they also experienced some setbacks. The team says its not difficult for them to copy a sentence almost perfectly. Its hard for them to copy complex intonations that show emotional changes in tens of seconds or minutes. Nevertheless, when it comes to human-computer interaction, the team says that in situations involving only brief conversations, the technology could bring about revolutionary changes. (Selected from: Silicon ANGLE Author: James Farrell Compiler: Netease Intelligent Participation: Yili)