Get ready for every brand and app to have its own voice: Microsoft started to make its custom neural voice product more widely available to commercial partners Wednesday, allowing companies to generate their own voices for chatbots and other interactive applications. Custom neural voices are based on Microsoft's Azure AI platform, and use neural networks to create voices that don't have a robotic sound, like old-school text-to-speech technology.
The company spotlighted some early high-profile customers:
AT&T is using custom neural voice tech to bring Bugs Bunny in its Dallas experience store to life. Customers are greeted by name, and can chat with the Looney Tunes character while exploring the store.
Progressive created a voice chatbot for Flo, the omnipresent face of the insurance brand.
Duolingo is using custom neural voice to create multilingual voices for a set of characters, meant to bring personality to its language-learning app. Soon, you'll be able to choose whether you'd rather get help with your Japanese lessons from an emo teenager, a video game-loving kiddo who eats too much candy or a speed-talker who thinks she is always right.
To create these voices, Microsoft is asking companies to supply them with speech samples; for AT&T's Bugs Bunny, a voice actor recorded 2,000 phrases and lines. Azure AI then uses two neural networks to turn text into speech that actually pronounces words correctly, and also gets the tone and duration of each and every phoneme right.
Microsoft isn't the first company to use AI for custom voices. Google and Amazon have both generated celebrity voices for their respective assistants in the past, and Amazon recently announced that it would white-label Alexa, complete with custom voices. In October, Toronto-based Resemble AI launched Localize, a service that clones voices to produce translated audio recordings in a number of different languages.
With AI getting better and better at creating voices that are indistinguishable from real recordings, we'll likely also see a whole new wave of deepfake audio. Microsoft, for its part, went out of its way to stress that it is aware of the potential for abuse:
The company will limit access to its custom neural voice product to pre-approved partners, who have to contractually agree to a code of conduct.
Customers also have to agree to add disclaimers to their applications if consumers could mistake an AI voice for a real person.
The company is exploring the use of watermarks to make sure that AI recordings aren't used out of context.
Microsoft is also asking voice actors to acknowledge within their recordings that they are knowingly participating in an AI voice project — a safeguard against voice hijacking.
"As creators of this technology, we have an obligation to make sure it's used responsibly," said Azure AI platform VP Eric Boyd. "We're careful with the partners we work with in making sure they follow the guidelines."
Janko Roettgers (@jank0) is a senior reporter at Protocol, reporting on the shifting power dynamics between tech, media, and entertainment, including the impact of new technologies. Previously, Janko was Variety's first-ever technology writer in San Francisco, where he covered big tech and emerging technologies. He has reported for Gigaom, Frankfurter Rundschau, Berliner Zeitung, and ORF, among others. He has written three books on consumer cord-cutting and online music and co-edited an anthology on internet subcultures. He lives with his family in Oakland.