Our previous blog on synthetic voice talked about what it is, how it works, and potential use cases for the technology. In this blog, our fifth in the Deepfake Voice series, we will focus our time around deepfake voice fraud. This subject has been a rising topic in discussion with deepfake voice, voice cloning, and synthetic voice, especially with the rise of scams and fake content.

In this series, we’ll cover:

  • Why are AI-generated voices here to stay?
  • What happens when AI is used for audio deepfake scams?
  • Tackling the deepfake voice fraud problem
  • Built-in audio watermarking protections

Why are AI-generated voices here to stay?

While the technology is still emerging, and the use cases are still being uncovered, many have surfaced in the last few years. Applications of synthetic voice include:

  • Using different voices to create variety for repetitive content such as weather, news, and sports reports in radio broadcasts
  • Creating synthetic replications of individual voice talent, such as Randy Hahn, sports commentator for the San Jose Sharks, who can scale sponsorship and advertising opportunities, even during their busy season
  • Translating audio content, such as podcast episodes, into other languages to reach a wider audience
  • Recreating voices of famous people from history or in pop culture for documentary or other entertainment purposes (always with their or their estate’s explicit permission)
  • Enabling the visually impaired to consume text materials

These are just a few of the use cases we’ve seen recently. There are more primed to impact industries from education to healthcare, but the technology has made more headlines due to its malicious uses, ergo deepfakes.

What happens when AI is used for audio deepfake scams?

Deepfake voice fraud is a real issue and has accelerated the conversation on protections. In 2019, fraudsters voice cloned a chief executive’s voice of an energy firm based in the United Kingdom. Wiring over 200K seemingly under the orders of the CEO, whose voice was authentic both in accent and tone. This incident was the first known cybercrime in Europe that directly used AI.

Another incident occurred in 2020. A bank manager working in the United Arab Emirates answered a call from a voice he recognized. He thought he was talking to a director of a company with whom he had conversations previously.

The director said that the company was acquiring another company and needed the bank to authorize transfers worth $35 million. Everything seemed legitimate, and the bank director initiated the transfers. However, it was a deepfake voice scam involving possibly more than 17 individuals who transferred the funds across the globe.

Aside from phone scams becoming more sophisticated with the technology, there’s also the concern of it being used to influence social, legal, and political discourses. Many of us have probably encountered something on social media that was deepfaked.

Justin Bieber recently fell for a deepfake when he came across a video of Tom Cruise playing the guitar on TikTok. However, he quickly learned that it was not Tom Cruise—it was a deepfake video. So how do we defend against deepfake fraud?

Tackling the deepfake voice fraud problem

As we covered in our first blog in this series, there are two ways to defend against deepfakes. The first approach is to create a detector that analyzes a prospective voice to determine if it was made using deepfake technology. Unfortunately, this method lacks future-proofing because the technology will constantly evolve, the detector software will always have to maintain parity, which will be difficult.

The second method is implementing an audio watermark—inaudible to the listener—that people cannot edit. It would essentially be a record of when the voice was created, edited, and used. Companies would have to implement this in their software, following an industry standard that would make it easier for people to know if a voice was synthetically created or not.

Built-in audio watermarking protections

Veritone has put the ethical use of synthetic voices at the forefront of our voice-as-a-service application and services called MARVEL.ai. Within the application, only approved voices are made available for text-to-speech use cases. And for custom voices, used for speech-to-speech synthesis or text-to-speech, the underlying technology of MARVEL.ai helps fight against the improper use of one’s voice.

We’ve adopted inaudible watermarks to guarantee the authenticity of a voice and prevent others from using them illicitly. In addition, custom voices and the associated data are protected with the highest standards possible, ensuring that the voice owners have complete control of how and where third parties use their voice.

Learn more about MARVEL.ai