Does your boss sound a little funny? It might be an audio deepfake

Voice deepfake attacks against enterprises, often aimed at tricking corporate employees into transferring money to the attackers, are on the rise. And at least in some cases, they’re succeeding.

Two cartoon heads facing each other

Audio deepfakes are a new spin on the impersonation tactics that have long been used in social engineering and phishing attacks, but most people aren’t trained to disbelieve their ears.

Illustration: Christopher T. Fong/Protocol

As a cyberattack investigator, Nick Giacopuzzi’s work now includes responding to growing attacks against businesses that involve deepfaked voices — and has ultimately left him convinced that in today's world, "we need to question everything."

In particular, Giacopuzzi has investigated multiple incidents where an attacker deployed fabricated audio, created with the help of AI, that purported to be an executive or a manager at a company. You can guess how it went: The fake boss asked an employee to urgently transfer funds. And in some cases, it’s worked, he said.

"It's your boss's voice. It sounds like the person you talk to every day," said Giacopuzzi, who is senior consultant for cyber investigations, intel and response at StoneTurn. "Sometimes it can be very successful."

It’s a new spin on the impersonation tactics that have long been used in social engineering and phishing attacks, but most people aren’t trained to disbelieve their ears.

And researchers say there's an even larger threat coming: attackers calling you up and speaking through a cloned voice in real time.

While many businesses may think that cyberattacks involving deepfakes of multiple varieties are still a future threat, a growing number are learning the hard way that it's already here, experts told Protocol.

Among cybersecurity professionals who focus on responding to cyberattacks, two-thirds of those recently surveyed by VMware reported that deepfakes — including both audio and video fabrications — were a component in attacks they’d investigated over the past year. That was up 13% from the previous year's study.

"It's your boss's voice. It sounds like the person you talk to every day."

The survey of 125 cybersecurity professionals didn't tally what portion of the deepfake attacks ended up succeeding, and VMware didn't disclose details on specific incidents. But Rick McElroy, principal cybersecurity strategist at VMware, said he's spoken with two corporate security chiefs whose companies have fallen prey to deepfake audio attacks in recent months.

In both cases, the attacks prompted the transfer of six-figure sums, McElroy said. Other publicly reported cases include an incident in which a Hong Kong bank manager was reportedly duped by deepfaked audio into transferring $35 million to attackers in 2020.

As of right now, responding to deepfakes is not a part of most security awareness training, McElroy noted.

"Generally speaking, [deepfakes] are probably being treated as something 'funny' — unless you've actually been attacked," he said.

The VMware study doesn't claim to give a sense of the overall pervasiveness of audio deepfake attacks against businesses. But it does offer evidence that the attacks are at least a growing problem for large enterprises, which are the biggest targets and typically the only companies that would employ or call in an IR team, McElroy said.

Voice cloning

With a short audio sample of a person speaking and a publicly available tool on GitHub, a human voice can be cloned today without the need for AI expertise. And it may not be long before faking someone else's voice could become possible in real time.

"Real-time deepfakes are the biggest threat on the horizon" in this arena, said Yisroel Mirsky, head of the Offensive AI Research Lab at Ben-Gurion University.

Mirsky — who previously led a study into the potential for deepfaked medical imaging to lead to misdiagnosis — told Protocol that his attention has shifted lately to the threat of voice deepfakes.

The aforementioned GitHub tool, which has been available since 2019, uses deep learning to clone a voice from just a few seconds of audio. The tool then enables the cloned voice to "speak" typed phrases using text-to-speech technology.

"Real-time deepfakes are the biggest threat on the horizon."

Mirsky provided an audio deepfake to Protocol that he created with the tool, using three seconds of a person's voice. The tool is too slow to use in a real-time attack, but an attacker could create probable phrases in advance and then play them as needed, he said.

Thanks to the advancements in the generation of voice deepfakes, the problem for attackers is now less about whether they can clone a voice, but instead how to utilize the cloned voice in real time. The ideal scenario for an attacker would be to speak, rather than type, and have their speech converted into the cloned voice.

But it appears that progress is being made on that technology as well. Mirsky pointed to a vocoder device that purports to be able to perform audio signal conversion, a key part of the process and the biggest bottleneck, with just a 10-millisecond delay.

In other words, a real-time voice deepfake may be achievable in the near future, if it isn’t already.

Hi-fi social engineering

Without a doubt, attacks such as deepfakes that target the "human attack surface" will take some time for people to adjust to, said Lisa O'Connor, managing director for Accenture Security.

When you hear a familiar voice on the phone, for instance, most people "haven't built the muscle memory to really think to challenge that," O'Connor said.

But judging by the advancement of the technology for cloning voices, it would appear we ought to start.

All in all, Mirsky said he sees audio deepfakes as the "much bigger threat" compared to video deepfakes. Video, he noted, only works in limited contexts, but fabricated audio can be used to call anybody, he noted.

And while “it might not sound perfectly like the individual, the urgent pretext is going to be enough to get [the target] to fall for it" in many cases, Mirsky said. "It's a very powerful social engineering tool. And that's a very big concern."

In response, Mirsky said his lab at Ben-Gurion University is currently focusing on the deepfake audio threat, with a goal of developing a way to detect real-time cloned voice attacks.

Training and changes to business processes will also be crucial for defending against this type of threat, according to McElroy. In the cases of wire transfers, for instance, companies may want to add another step in the process, such as a challenge phrase, he said.

That becomes trickier in the case of a deepfake left as a voicemail, though, particularly in a seemingly high-pressure situation, he acknowledged.

Giacopuzzi, the StoneTurn cyberattack investigator, said the "sense of urgency” that is a cornerstone of social engineering attacks has carried over to deepfake audio attacks. "It's still pushing all the same buttons," he said.

And that's what is most troubling of all, Giacopuzzi said: "It's playing on our psychology.” And as a result, “there are successes. So I think it's just going to get worse."


Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep ReadingShow less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep ReadingShow less
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep ReadingShow less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep ReadingShow less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.


Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep ReadingShow less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories