As a cyberattack investigator, Nick Giacopuzzi’s work now includes responding to growing attacks against businesses that involve deepfaked voices — and has ultimately left him convinced that in today's world, "we need to question everything."
In particular, Giacopuzzi has investigated multiple incidents where an attacker deployed fabricated audio, created with the help of AI, that purported to be an executive or a manager at a company. You can guess how it went: The fake boss asked an employee to urgently transfer funds. And in some cases, it’s worked, he said.
"It's your boss's voice. It sounds like the person you talk to every day," said Giacopuzzi, who is senior consultant for cyber investigations, intel and response at StoneTurn. "Sometimes it can be very successful."
It’s a new spin on the impersonation tactics that have long been used in social engineering and phishing attacks, but most people aren’t trained to disbelieve their ears.
And researchers say there's an even larger threat coming: attackers calling you up and speaking through a cloned voice in real time.
While many businesses may think that cyberattacks involving deepfakes of multiple varieties are still a future threat, a growing number are learning the hard way that it's already here, experts told Protocol.
Among cybersecurity professionals who focus on responding to cyberattacks, two-thirds of those recently surveyed by VMware reported that deepfakes — including both audio and video fabrications — were a component in attacks they’d investigated over the past year. That was up 13% from the previous year's study.
"It's your boss's voice. It sounds like the person you talk to every day."
The survey of 125 cybersecurity professionals didn't tally what portion of the deepfake attacks ended up succeeding, and VMware didn't disclose details on specific incidents. But Rick McElroy, principal cybersecurity strategist at VMware, said he's spoken with two corporate security chiefs whose companies have fallen prey to deepfake audio attacks in recent months.
In both cases, the attacks prompted the transfer of six-figure sums, McElroy said. Other publicly reported cases include an incident in which a Hong Kong bank manager was reportedly duped by deepfaked audio into transferring $35 million to attackers in 2020.
As of right now, responding to deepfakes is not a part of most security awareness training, McElroy noted.
"Generally speaking, [deepfakes] are probably being treated as something 'funny' — unless you've actually been attacked," he said.
The VMware study doesn't claim to give a sense of the overall pervasiveness of audio deepfake attacks against businesses. But it does offer evidence that the attacks are at least a growing problem for large enterprises, which are the biggest targets and typically the only companies that would employ or call in an IR team, McElroy said.
Voice cloning
With a short audio sample of a person speaking and a publicly available tool on GitHub, a human voice can be cloned today without the need for AI expertise. And it may not be long before faking someone else's voice could become possible in real time.
"Real-time deepfakes are the biggest threat on the horizon" in this arena, said Yisroel Mirsky, head of the Offensive AI Research Lab at Ben-Gurion University.
Mirsky — who previously led a study into the potential for deepfaked medical imaging to lead to misdiagnosis — told Protocol that his attention has shifted lately to the threat of voice deepfakes.
The aforementioned GitHub tool, which has been available since 2019, uses deep learning to clone a voice from just a few seconds of audio. The tool then enables the cloned voice to "speak" typed phrases using text-to-speech technology.
"Real-time deepfakes are the biggest threat on the horizon."
Mirsky provided an audio deepfake to Protocol that he created with the tool, using three seconds of a person's voice. The tool is too slow to use in a real-time attack, but an attacker could create probable phrases in advance and then play them as needed, he said.
Thanks to the advancements in the generation of voice deepfakes, the problem for attackers is now less about whether they can clone a voice, but instead how to utilize the cloned voice in real time. The ideal scenario for an attacker would be to speak, rather than type, and have their speech converted into the cloned voice.
But it appears that progress is being made on that technology as well. Mirsky pointed to a vocoder device that purports to be able to perform audio signal conversion, a key part of the process and the biggest bottleneck, with just a 10-millisecond delay.
In other words, a real-time voice deepfake may be achievable in the near future, if it isn’t already.
Hi-fi social engineering
Without a doubt, attacks such as deepfakes that target the "human attack surface" will take some time for people to adjust to, said Lisa O'Connor, managing director for Accenture Security.
When you hear a familiar voice on the phone, for instance, most people "haven't built the muscle memory to really think to challenge that," O'Connor said.
But judging by the advancement of the technology for cloning voices, it would appear we ought to start.
All in all, Mirsky said he sees audio deepfakes as the "much bigger threat" compared to video deepfakes. Video, he noted, only works in limited contexts, but fabricated audio can be used to call anybody, he noted.
And while “it might not sound perfectly like the individual, the urgent pretext is going to be enough to get [the target] to fall for it" in many cases, Mirsky said. "It's a very powerful social engineering tool. And that's a very big concern."
In response, Mirsky said his lab at Ben-Gurion University is currently focusing on the deepfake audio threat, with a goal of developing a way to detect real-time cloned voice attacks.
Training and changes to business processes will also be crucial for defending against this type of threat, according to McElroy. In the cases of wire transfers, for instance, companies may want to add another step in the process, such as a challenge phrase, he said.
That becomes trickier in the case of a deepfake left as a voicemail, though, particularly in a seemingly high-pressure situation, he acknowledged.
Giacopuzzi, the StoneTurn cyberattack investigator, said the "sense of urgency” that is a cornerstone of social engineering attacks has carried over to deepfake audio attacks. "It's still pushing all the same buttons," he said.
And that's what is most troubling of all, Giacopuzzi said: "It's playing on our psychology.” And as a result, “there are successes. So I think it's just going to get worse."