Microsoft wants to take AI voices everywhere
By adding safeguards, Microsoft wants to ensure deepfake voices aren't being abused.
Get ready for every brand and app to have its own voice: Microsoft started to make its custom neural voice product more widely available to commercial partners Wednesday, allowing companies to generate their own voices for chatbots and other interactive applications. Custom neural voices are based on Microsoft's Azure AI platform, and use neural networks to create voices that don't have a robotic sound, like old-school text-to-speech technology.
The company spotlighted some early high-profile customers:
To create these voices, Microsoft is asking companies to supply them with speech samples; for AT&T's Bugs Bunny, a voice actor recorded 2,000 phrases and lines. Azure AI then uses two neural networks to turn text into speech that actually pronounces words correctly, and also gets the tone and duration of each and every phoneme right.
Microsoft isn't the first company to use AI for custom voices. Google and Amazon have both generated celebrity voices for their respective assistants in the past, and Amazon recently announced that it would white-label Alexa, complete with custom voices. In October, Toronto-based Resemble AI launched Localize, a service that clones voices to produce translated audio recordings in a number of different languages.
With AI getting better and better at creating voices that are indistinguishable from real recordings, we'll likely also see a whole new wave of deepfake audio. Microsoft, for its part, went out of its way to stress that it is aware of the potential for abuse:
"As creators of this technology, we have an obligation to make sure it's used responsibly," said Azure AI platform VP Eric Boyd. "We're careful with the partners we work with in making sure they follow the guidelines."
A version of this story will appear in this week's Next Up newsletter.
Janko Roettgers (@jank0) is a senior reporter at Protocol, reporting on the shifting power dynamics between tech, media, and entertainment, including the impact of new technologies. Previously, Janko was Variety's first-ever technology writer in San Francisco, where he covered big tech and emerging technologies. He has reported for Gigaom, Frankfurter Rundschau, Berliner Zeitung, and ORF, among others. He has written three books on consumer cord-cutting and online music and co-edited an anthology on internet subcultures. He lives with his family in Oakland.