Does your boss sound a little funny? It might be an audio deepfake

Voice deepfake attacks against enterprises, often aimed at tricking corporate employees into transferring money to the attackers, are on the rise. And at least in some cases, they’re succeeding.

Two cartoon heads facing each other

Audio deepfakes are a new spin on the impersonation tactics that have long been used in social engineering and phishing attacks, but most people aren’t trained to disbelieve their ears.

Illustration: Christopher T. Fong/Protocol

As a cyberattack investigator, Nick Giacopuzzi’s work now includes responding to growing attacks against businesses that involve deepfaked voices — and has ultimately left him convinced that in today's world, "we need to question everything."

In particular, Giacopuzzi has investigated multiple incidents where an attacker deployed fabricated audio, created with the help of AI, that purported to be an executive or a manager at a company. You can guess how it went: The fake boss asked an employee to urgently transfer funds. And in some cases, it’s worked, he said.

"It's your boss's voice. It sounds like the person you talk to every day," said Giacopuzzi, who is senior consultant for cyber investigations, intel and response at StoneTurn. "Sometimes it can be very successful."

It’s a new spin on the impersonation tactics that have long been used in social engineering and phishing attacks, but most people aren’t trained to disbelieve their ears.

And researchers say there's an even larger threat coming: attackers calling you up and speaking through a cloned voice in real time.

While many businesses may think that cyberattacks involving deepfakes of multiple varieties are still a future threat, a growing number are learning the hard way that it's already here, experts told Protocol.

Among cybersecurity professionals who focus on responding to cyberattacks, two-thirds of those recently surveyed by VMware reported that deepfakes — including both audio and video fabrications — were a component in attacks they’d investigated over the past year. That was up 13% from the previous year's study.

"It's your boss's voice. It sounds like the person you talk to every day."

The survey of 125 cybersecurity professionals didn't tally what portion of the deepfake attacks ended up succeeding, and VMware didn't disclose details on specific incidents. But Rick McElroy, principal cybersecurity strategist at VMware, said he's spoken with two corporate security chiefs whose companies have fallen prey to deepfake audio attacks in recent months.

In both cases, the attacks prompted the transfer of six-figure sums, McElroy said. Other publicly reported cases include an incident in which a Hong Kong bank manager was reportedly duped by deepfaked audio into transferring $35 million to attackers in 2020.

As of right now, responding to deepfakes is not a part of most security awareness training, McElroy noted.

"Generally speaking, [deepfakes] are probably being treated as something 'funny' — unless you've actually been attacked," he said.

The VMware study doesn't claim to give a sense of the overall pervasiveness of audio deepfake attacks against businesses. But it does offer evidence that the attacks are at least a growing problem for large enterprises, which are the biggest targets and typically the only companies that would employ or call in an IR team, McElroy said.

Voice cloning

With a short audio sample of a person speaking and a publicly available tool on GitHub, a human voice can be cloned today without the need for AI expertise. And it may not be long before faking someone else's voice could become possible in real time.

"Real-time deepfakes are the biggest threat on the horizon" in this arena, said Yisroel Mirsky, head of the Offensive AI Research Lab at Ben-Gurion University.

Mirsky — who previously led a study into the potential for deepfaked medical imaging to lead to misdiagnosis — told Protocol that his attention has shifted lately to the threat of voice deepfakes.

The aforementioned GitHub tool, which has been available since 2019, uses deep learning to clone a voice from just a few seconds of audio. The tool then enables the cloned voice to "speak" typed phrases using text-to-speech technology.

"Real-time deepfakes are the biggest threat on the horizon."

Mirsky provided an audio deepfake to Protocol that he created with the tool, using three seconds of a person's voice. The tool is too slow to use in a real-time attack, but an attacker could create probable phrases in advance and then play them as needed, he said.

Thanks to the advancements in the generation of voice deepfakes, the problem for attackers is now less about whether they can clone a voice, but instead how to utilize the cloned voice in real time. The ideal scenario for an attacker would be to speak, rather than type, and have their speech converted into the cloned voice.

But it appears that progress is being made on that technology as well. Mirsky pointed to a vocoder device that purports to be able to perform audio signal conversion, a key part of the process and the biggest bottleneck, with just a 10-millisecond delay.

In other words, a real-time voice deepfake may be achievable in the near future, if it isn’t already.

Hi-fi social engineering

Without a doubt, attacks such as deepfakes that target the "human attack surface" will take some time for people to adjust to, said Lisa O'Connor, managing director for Accenture Security.

When you hear a familiar voice on the phone, for instance, most people "haven't built the muscle memory to really think to challenge that," O'Connor said.

But judging by the advancement of the technology for cloning voices, it would appear we ought to start.

All in all, Mirsky said he sees audio deepfakes as the "much bigger threat" compared to video deepfakes. Video, he noted, only works in limited contexts, but fabricated audio can be used to call anybody, he noted.

And while “it might not sound perfectly like the individual, the urgent pretext is going to be enough to get [the target] to fall for it" in many cases, Mirsky said. "It's a very powerful social engineering tool. And that's a very big concern."

In response, Mirsky said his lab at Ben-Gurion University is currently focusing on the deepfake audio threat, with a goal of developing a way to detect real-time cloned voice attacks.

Training and changes to business processes will also be crucial for defending against this type of threat, according to McElroy. In the cases of wire transfers, for instance, companies may want to add another step in the process, such as a challenge phrase, he said.

That becomes trickier in the case of a deepfake left as a voicemail, though, particularly in a seemingly high-pressure situation, he acknowledged.

Giacopuzzi, the StoneTurn cyberattack investigator, said the "sense of urgency” that is a cornerstone of social engineering attacks has carried over to deepfake audio attacks. "It's still pushing all the same buttons," he said.

And that's what is most troubling of all, Giacopuzzi said: "It's playing on our psychology.” And as a result, “there are successes. So I think it's just going to get worse."


Inside Amazon’s free video strategy

Amazon has been doubling down on original content for Freevee, its ad-supported video service, which has seen a lot of growth thanks to a deep integration with other Amazon properties.

Freevee’s investment into original programming like 'Bosch: Legacy' has increased by 70%.

Photo: Tyler Golden/Amazon Freevee

Amazon’s streaming efforts have long been all about Prime Video. So the company caught pundits by surprise when, in early 2019, it launched a stand-alone ad-supported streaming service called IMDb Freedive, with Techcrunch calling the move “a bit odd.”

Nearly four years and two rebrandings later, Amazon’s ad-supported video efforts appear to be flourishing. Viewership of the service grew by 138% from 2020 to 2021, according to Amazon. The company declined to share any updated performance data on the service, which is now called Freevee, but a spokesperson told Protocol the performance of originals in particular “exceeded expectations,” leading Amazon to increase investments into original content by 70% year-over-year.

Keep Reading Show less
Janko Roettgers

Janko Roettgers (@jank0) is a senior reporter at Protocol, reporting on the shifting power dynamics between tech, media, and entertainment, including the impact of new technologies. Previously, Janko was Variety's first-ever technology writer in San Francisco, where he covered big tech and emerging technologies. He has reported for Gigaom, Frankfurter Rundschau, Berliner Zeitung, and ORF, among others. He has written three books on consumer cord-cutting and online music and co-edited an anthology on internet subcultures. He lives with his family in Oakland.

Sponsored Content

Great products are built on strong patents

Experts say robust intellectual property protection is essential to ensure the long-term R&D required to innovate and maintain America's technology leadership.

Every great tech product that you rely on each day, from the smartphone in your pocket to your music streaming service and navigational system in the car, shares one important thing: part of its innovative design is protected by intellectual property (IP) laws.

From 5G to artificial intelligence, IP protection offers a powerful incentive for researchers to create ground-breaking products, and governmental leaders say its protection is an essential part of maintaining US technology leadership. To quote Secretary of Commerce Gina Raimondo: "intellectual property protection is vital for American innovation and entrepreneurship.”

Keep Reading Show less
James Daly
James Daly has a deep knowledge of creating brand voice identity, including understanding various audiences and targeting messaging accordingly. He enjoys commissioning, editing, writing, and business development, particularly in launching new ventures and building passionate audiences. Daly has led teams large and small to multiple awards and quantifiable success through a strategy built on teamwork, passion, fact-checking, intelligence, analytics, and audience growth while meeting budget goals and production deadlines in fast-paced environments. Daly is the Editorial Director of 2030 Media and a contributor at Wired.

Wall Street is warming up to crypto

Secure, well-regulated technology infrastructure could draw more large banks to crypto.

Technology infrastructure for crypto has begun to mature.

Illustration: Christopher T. Fong/Protocol

Despite a downturn in crypto markets, more large institutional investors are seeking to invest in crypto.

One factor holding them back is a lack of infrastructure for large institutions compared to what exists in the traditional, regulated capital markets.

Keep Reading Show less
Tomio Geron

Tomio Geron ( @tomiogeron) is a San Francisco-based reporter covering fintech. He was previously a reporter and editor at The Wall Street Journal, covering venture capital and startups. Before that, he worked as a staff writer at Forbes, covering social media and venture capital, and also edited the Midas List of top tech investors. He has also worked at newspapers covering crime, courts, health and other topics. He can be reached at tgeron@protocol.com or tgeron@protonmail.com.


How I decided to go all-in on a federal contract — before assignment

Amanda Renteria knew Code for America could help facilitate access to expanded child tax credits. She also knew there was no guarantee her proof of concept would convince others — but tried anyway.

Code for America CEO Amanda Renteria explained how it's helped people claim the Child Tax Credit.

Photo: Code for America

Click banner image for more How I decided series

After the American Rescue Plan Act passed in March 2021, the U.S. government expanded child tax credits to provide relief for American families during the pandemic. The legislation allowed some families to nearly double their tax benefits per child, which was especially critical for low-income families, who disproportionately bore the financial brunt of the pandemic.

Keep Reading Show less
Hirsh Chitkara

Hirsh Chitkara ( @HirshChitkara) is a reporter at Protocol focused on the intersection of politics, technology and society. Before joining Protocol, he helped write a daily newsletter at Insider that covered all things Big Tech. He's based in New York and can be reached at hchitkara@protocol.com.


This carbon capture startup wants to clean up the worst polluters

The founder and CEO of point-source carbon capture company Carbon Clean discusses what the startup has learned, the future of carbon capture technology, as well as the role of companies like his in battling the climate crisis.

Carbon Clean CEO Aniruddha Sharma told Protocol that fossil fuels are necessary, at least in the near term, to lift the living standards of those who don’t have access to cars and electricity.

Photo: Carbon Clean

Carbon capture and storage has taken on increasing importance as companies with stubborn emissions look for new ways to meet their net zero goals. For hard-to-abate industries like cement and steel production, it’s one of the few options that exist to help them get there.

Yet it’s proven incredibly challenging to scale the technology, which captures carbon pollution at the source. U.K.-based company Carbon Clean is leading the charge to bring down costs. This year, it raised a $150 million series C round, which the startup said is the largest-ever funding round for a point-source carbon capture company.

Keep Reading Show less
Michelle Ma

Michelle Ma (@himichellema) is a reporter at Protocol covering climate. Previously, she was a news editor of live journalism and special coverage for The Wall Street Journal. Prior to that, she worked as a staff writer at Wirecutter. She can be reached at mma@protocol.com.

Latest Stories