In 1993 I worked with some of the world’s best OCR scanners and various text to speech systems, as a volunteer in a program to help blind graduate students.
By 1997 I was deploying industry leading speech to text at a major clinical and research hospital….
The technology and subject matter has been familiar to me for three decades, particularly the pace of development and change.
The Verge, for example, clearly explained in 2016 that fake generative speech was not yet realistic.
Google’s DeepMind artificial intelligence has produced what could be some of the most realistic-sounding machine speech yet…although it was less convincing than actual human speech. […] WaveNet isn’t going to be showing up in something like Google Assistant right now — its step-by-step processing system requires a comparatively huge amount of computing power.
More to the point, the initial paper outlining the technology was only published in September of 2016.
Adobe infamously in 2016 did a gimmicky demo of voice-editing “Soundshop” (e.g. Photoshop for sound) and it was very unconvincing.
At a live demo in San Diego on Thursday, Adobe took a digitised recording of a man saying “and I kissed my dogs and my wife” and changed it to say “and I kissed Jordan three times”.
Obviously reducing signal quality is like using a smudge tool on a photo. If your finger pushes color pixels enough, a digital image of a brown cow might be made to look instead like a very blurry mud puddle, yet that is not true generative photography let alone speech.
The Adobe product news in fact admitted a large gap between their stunt and reality:
…a spokeswoman stressed that this did not mean its release was imminent. “[It] may or may not be released as a product or product feature,” she told the BBC. “No ship date has been announced.”
Adobe basically announced software for autotune, which was like announcing nothing at all.
To be fair, the main stage stunt inspired further research such as more Google papers (e.g. Tacotron) and a Masters Degree thesis in 2019, which clearly states the state of things expected in 2020.
Recent advances in deep learning have shown impressive results in the domain of text-to-speech. To this end, a deep neural network is usually trained using a corpus of several hours of professionally recorded speech from a single speaker. Giving a new voice to such a model is highly expensive, as it requires recording a new dataset and retraining the model.
Thus it was around 2020 when quality and performance improved noticably. However, researchers still referenced expensive compute power and at least 50 hours of sample audio from a targeted speaker for proper fidelity.
All this goes to show 2016 definitely was not yet a time for fake speech with a target voice to be easily generated. The mass popularization of fake speech came later with attention-seeking projects like 15.ai.
Now that we have context and history set, take note of how the pathological liar Elon Musk is now making absurd claims about voice in court.
A plaintiff was set to depose Elon Musk regarding his very infamous comments in 2016 like this one:
A Model S and Model X, at this point, can drive autonomously with greater safety than a person. Right now.
Or the many written ones like these two statements fraudulently pushing people in 2016 to trust Autopilot as crash avoidance (an obvious and dangerous lie):
Despite overwhelming proof to the contrary, Musk had his lawyers argue that he should not be deposed because (1) he doesn’t have a very good memory and (2) he “like many public figures, is the subject of many ‘deepfake’ videos and audio recordings that purport to show him saying and doing things he never actually said or did.”
If you ever wondered why Musk uses an X in all his companies, this is it.
The amorphous, undefined, irresponsible X signifies a shallow childish vision of always being able to avoid shame, blame and accountability.
I really didn’t need to prove that the technical voice capabilities actually did not exist in 2016. I did it only because it is so easy to prove Musk a liar.
Musk makes false claims about technology capabilities, over and over and over again. In this case his false claims in 2016 are meant to be covered up with him making false claims in 2023.
In other words, we knew in 2016 he was completely divorced from reality and there are copious examples of him overpromising and underdelivering safety features.
When Musk said in various ways in 2016 that his car was safer than a human, I started a presentation series right then explaining how that was not true. I gave a keynote at a security conference on it.
Musk did not at any point contradict, or in any way indicate that he disagreed with, his ongoing VERY INFAMOUS widespread statements about Tesla safety attributed to him. Over and over he said outlandish and dangerously irresponsible things about Tesla.
I noticed and tracked the clear dangers to society because I was always waiting on him to stop misrepresenting safety, to become responsible.
Obviously, just like his cars, he only learns how to be worse.