Jan 19, 2021

Though photo, video, and audio manipulation have been practiced since the 19th century, it was not until the late 90’s that humans would begin utilizing AI for execution. The project, done in 1997, used ML (machine learning) techniques to correlate sounds produced by mouth with mouth and face shape. The final product? Footage of an individual uttering words to a song as if they sang it. Considering the limitations of the time, the frames and imagery surrounding the individual’s mouth and lower face were pixelated and blurry. There was much room for improvement as it was relatively easy to distinguish between the modified and original versions.

As the concept was further developed, reinvented, and elaborated upon, it was not until 20 years later that researchers at the University of Washington would create near indistinguishable audio and video of ex-President Barack Obama.

These are all fake, AI generated images of the ex president’s speech patterns.

The biggest difference between this and the work done in 1997 is the core data synthesized to create the final product. In 1997, they only fed the algorithm's visual footage, resulting in a video of the individual looking like they were saying the words, however the voice and words themselves were not the individuals. They simply layered an existing audio track over the video to create a face-to-voice mismatch. While in 2017, more advanced algorithms, neural networks, and data collection methods aimed to collect auditory samples as well. Resulting in verbal pronunciation of any word indistinguishable from the real Barack Obama. “These types of results have never been shown before,” said Ira Kemelmacher-Shlizerman, an assistant professor at the UW’s Paul G. Allen School of Computer Science & Engineering. Granted there are enough hours of footage and audio available for the AI to process, the same can be done for anybody. This is Deepfake technology.

Now, before we go down a rabbit hole of scandals and falsification, let us first consider the benign applications of such tech. Deepfake AI has great potential to save billions of dollars in the cinematic and business industries. It can retroactively update footage and audio to avoid reshoots as well as instantaneously refresh and smoothen face-to-face video conferences for higher quality communication. Though development is still underway, Deepface AI could also create conversational, virtual depictions of famous historical figures who are no longer around to speak their minds. Imagine having a live chat with Martin Luther King, Adolf Hitler, or Tupac Shakur. You could even talk to them all at once. Additionally, it can be incredibly beneficial in the political and public relation spheres. For example, if an influential figure were needed to address the public, but was unable for whatever reason, Deepfakes could generate speech and animation to say whatever they wanted on screen. But, on the other hand, this is where things get spooky.

If advanced versions of Deepfake AI were to fall in the wrong hands, it could put the entire modern world in jeopardy. Fake declarations of war and attacks between countries could potentially throw international relations out of control. AI generated footage and audio could be used as fabricated evidence to prove a hostages consciousness, thus fooling rescue teams and authorities. Even further, “The latest form of deepfake videos will go beyond simple face swapping, to whole-head synthesis [head puppetry], joint audiovisual synthesis [talking heads], and even whole-body synthesis,” says Siwei Liu, Associate Professor of Human Development at UC Davis. The concept of speech forgery and bodily swapping/generation of features could frame an innocent individual for a crime they did not commit, paving way for a completely new method of evidence falsification. The same application could be used to spark fake news and drama on any societally influential figure as well.

All of these outcomes, no matter how one looks at it, are undesirable and, unfortunately, unavoidable if Deepfake tech continues to develop and become more and more accessible to the average person. The real concern, at this point, is not what can be done to prevent such outcomes, but what can be done to identify them. If the core of the terrifying, aforementioned uses lie in its ability to remain indistinguishable from the truth, the problem, then, is not letting the tech fall into the wrong hands, but a lack of advancement in Deepfake AI that stays far enough ahead of the competition to identify falsified audio and video. In other words, AI dedicated to spotting Deepfake deployment using Deepfake analysis. This is what must be done to keep the public safe in our digitized world.  

Kai Medina

Kai Medina is an undergraduate entrepreneur at USC's business school who specializes in operations and product management. He's currently building Leasify, a startup that connects lease hosts with new tenants via an airbnb like service.

Kai has a great interest in all things entrepreneurial, but specifically wants to soon start a fintech company at the intersection of algorithmic trading and SaaS mobile applications.

What drew Kai to AI LA was knowing that AI's application to existing industries will undeniably serve as the new status quo for execution in the near future. Knowing this, he will break into the field of AI and big data analysis implementation so that he may create new utilizations to benefit the world.