People

Microsoft wants to take AI voices everywhere

By adding safeguards, Microsoft wants to ensure deepfake voices aren't being abused.

Duolingo Bea character

Duolingo gave its Bea character its own voice, with a little help from some neural networks.

Image: Duolingo

Get ready for every brand and app to have its own voice: Microsoft started to make its custom neural voice product more widely available to commercial partners Wednesday, allowing companies to generate their own voices for chatbots and other interactive applications. Custom neural voices are based on Microsoft's Azure AI platform, and use neural networks to create voices that don't have a robotic sound, like old-school text-to-speech technology.

The company spotlighted some early high-profile customers:

  • AT&T is using custom neural voice tech to bring Bugs Bunny in its Dallas experience store to life. Customers are greeted by name, and can chat with the Looney Tunes character while exploring the store.
  • Progressive created a voice chatbot for Flo, the omnipresent face of the insurance brand.
  • Duolingo is using custom neural voice to create multilingual voices for a set of characters, meant to bring personality to its language-learning app. Soon, you'll be able to choose whether you'd rather get help with your Japanese lessons from an emo teenager, a video game-loving kiddo who eats too much candy or a speed-talker who thinks she is always right.

To create these voices, Microsoft is asking companies to supply them with speech samples; for AT&T's Bugs Bunny, a voice actor recorded 2,000 phrases and lines. Azure AI then uses two neural networks to turn text into speech that actually pronounces words correctly, and also gets the tone and duration of each and every phoneme right.

Microsoft isn't the first company to use AI for custom voices. Google and Amazon have both generated celebrity voices for their respective assistants in the past, and Amazon recently announced that it would white-label Alexa, complete with custom voices. In October, Toronto-based Resemble AI launched Localize, a service that clones voices to produce translated audio recordings in a number of different languages.

With AI getting better and better at creating voices that are indistinguishable from real recordings, we'll likely also see a whole new wave of deepfake audio. Microsoft, for its part, went out of its way to stress that it is aware of the potential for abuse:

  • The company will limit access to its custom neural voice product to pre-approved partners, who have to contractually agree to a code of conduct.
  • Customers also have to agree to add disclaimers to their applications if consumers could mistake an AI voice for a real person.
  • The company is exploring the use of watermarks to make sure that AI recordings aren't used out of context.
  • Microsoft is also asking voice actors to acknowledge within their recordings that they are knowingly participating in an AI voice project — a safeguard against voice hijacking.

"As creators of this technology, we have an obligation to make sure it's used responsibly," said Azure AI platform VP Eric Boyd. "We're careful with the partners we work with in making sure they follow the guidelines."

A version of this story will appear in this week's Next Up newsletter.

Workplace

The tools that make you pay for not getting stuff done

Some tools let you put your money on the line for productivity. Should you bite?

Commitment contracts are popular in a niche corner of the internet, and the tools have built up loyal followings of people who find the extra motivation effective.

Photoillustration: Anna Shvets/Pexels; Protocol

Danny Reeves, CEO and co-founder of Beeminder, is used to defending his product.

“When people first hear about it, they’re kind of appalled,” Reeves said. “Making money off of people’s failure is how they view it.”

Keep Reading Show less
Lizzy Lawrence

Lizzy Lawrence ( @LizzyLaw_) is a reporter at Protocol, covering tools and productivity in the workplace. She's a recent graduate of the University of Michigan, where she studied sociology and international studies. She served as editor in chief of The Michigan Daily, her school's independent newspaper. She's based in D.C., and can be reached at llawrence@protocol.com.

Sponsored Content

Foursquare data story: leveraging location data for site selection

We take a closer look at points of interest and foot traffic patterns to demonstrate how location data can be leveraged to inform better site selecti­on strategies.

Imagine: You’re the leader of a real estate team at a restaurant brand looking to open a new location in Manhattan. You have two options you’re evaluating: one site in SoHo, and another site in the Flatiron neighborhood. Which do you choose?

Keep Reading Show less

Elon Musk has bots on his mind.

Photo: Christian Marquardt/Getty Images

Elon Musk says he needs proof that less than 5% of Twitter's users are bots — or the deal isn't going ahead.

Keep Reading Show less
Jamie Condliffe

Jamie Condliffe ( @jme_c) is the executive editor at Protocol, based in London. Prior to joining Protocol in 2019, he worked on the business desk at The New York Times, where he edited the DealBook newsletter and wrote Bits, the weekly tech newsletter. He has previously worked at MIT Technology Review, Gizmodo, and New Scientist, and has held lectureships at the University of Oxford and Imperial College London. He also holds a doctorate in engineering from the University of Oxford.

Policy

Nobody will help Big Tech prevent online terrorism but itself

There’s no will in Congress or the C-suites of social media giants for a new approach, but smaller platforms would have room to step up — if they decided to.

Timothy Kujawski of Buffalo lights candles at a makeshift memorial as people gather at the scene of a mass shooting at Tops Friendly Market at Jefferson Avenue and Riley Street on Sunday, May 15, 2022 in Buffalo, NY. The fatal shooting of 10 people at a grocery store in a historically Black neighborhood of Buffalo by a young white gunman is being investigated as a hate crime and an act of racially motivated violent extremism, according to federal officials.

Photo: Kent Nishimura / Los Angeles Times via Getty Images

The shooting in Buffalo, New York, that killed 10 people over the weekend has put the spotlight back on social media companies. Some of the attack was livestreamed, beginning on Amazon-owned Twitch, and the alleged shooter appears to have written about how his racist motivations arose from misinformation on smaller or fringe sites including 4chan.

In response, policymakers are directing their anger at tech platforms, with New York Governor Kathy Hochul calling for the companies to be “more vigilant in monitoring” and for “a legal responsibility to ensure that such hate cannot populate these sites.”

Keep Reading Show less
Ben Brody

Ben Brody (@ BenBrodyDC) is a senior reporter at Protocol focusing on how Congress, courts and agencies affect the online world we live in. He formerly covered tech policy and lobbying (including antitrust, Section 230 and privacy) at Bloomberg News, where he previously reported on the influence industry, government ethics and the 2016 presidential election. Before that, Ben covered business news at CNNMoney and AdAge, and all manner of stories in and around New York. He still loves appearing on the New York news radio he grew up with.

We're answering all your questions about the crypto crash.

Photo: Chris Liverani/Unsplash

People started talking about another crypto winter in January, when falling prices had wiped out $1 trillion in value from November’s peak. Prices rallied back in March, restoring some of the losses. Then crypto fell hard again, with bitcoin down more than 60% from its all-time high and other cryptocurrencies harder hit. The market’s message was clear: Crypto winter was no longer coming. It’s here.

If you’ve got questions about the crypto crash, the Protocol Fintech team has answers.

Keep Reading Show less
Latest Stories
Bulletins