Whose voice is it anyway?
Good morning, and welcome to Protocol Next Up, a weekly newsletter about the future of technology and entertainment. This week, we're taking a closer look at Apple's new HomePod Mini as well as ways to clone your voice with AI.
Also, please RSVP for TV's Tipping Point, Protocol's first online event about the future of tech and entertainment, on Oct. 28.
(Was this email forwarded to you? Sign up here to get Next Up every week.)
The Big Story
With the HomePod Mini, Apple is playing catch-up in the smart speaker space
Apple used its iPhone event Tuesday to introduce the HomePod Mini, a new $99 spherical smart speaker that is meant to close the gap with competing products from Amazon and Google. However, with limited support for third-party music services at launch, and no way for developers to bring their services to the device, Apple is still playing catch-up in the smart speaker space.
Apple made some compromises to get the price down to just $99. Most notably: Unlike the original HomePod, the new device isn't adjusting the sound to its environment. This allowed the company to use just three mics, instead of the six that it put into the original $300 HomePod.
The HomePod Mini is instead monitoring the music it plays in real time, and then tweaks playback on the fly. That's similar to the approach Google took with its new Nest Audio speaker, and it makes sense: With a smaller speaker, there is less bass and distortion to deal with.
Speaking of Google: People who have Nest speakers, or Amazon's Echo devices, for that matter, will have recognized some of the other software features announced by Apple on Tuesday as well. For instance, the competition has long offered the type of intercom functionality offered by the HomePod. As IoT podcaster Stacey Higginbotham put it: "Apple has just copied everything Google Assistant does on its home devices and stole Amazon's form factor."
Still, that in itself is a notable shift for Apple. When the company first announced the HomePod in 2017, it put a big emphasis on sound quality and positioned Siri primarily as a way to control music. On Tuesday, execs actually called the HomePod Mini a smart speaker — a description that has largely been absent from the original HomePod marketing material.
Even with a smarter Siri, the HomePod Mini is still lacking some of the features offered by the competition:
- No Spotify, at least for now. At launch, the Mini will only support Apple Music. Support for third-party music services is promised to follow in the coming months, starting with Pandora and Amazon Music. The omission of Spotify during Tuesday's presentation raised some eyebrows, though, and will be worth keeping an eye on. It does sound like the HomePod Mini and its larger sibling will eventually be open to all third-party music services. But given the public spat between the two companies over App Store fees, we'll have to wait and see if and when Spotify actually makes it to the device.
- No third-party skills. Apple has put a lot of work into improving the Siri experience on the HomePod, but the assistant still doesn't have the kind of third-party developer platform available for Alexa and the Google Assistant. This means that consumers will be largely confined to the functionality offered by Apple, and that developers will instead embrace competing devices for voice-first services.
- Just a single driver. The original HomePod wasn't much of a commercial success, but people who bought it often praised its sound quality. For the HomePod Mini, Apple reduced the audio hardware down to a single driver, combined with passive radiators for sound propagation. The Nest Audio, on the other hand, combines a woofer and a tweeter, and the new Amazon Echo even has a woofer and two tweeters. The big question here is: Will the $99 HomePod Mini sound as good as other smart speakers in its price category, or will its sound be closer to that of the $50 Nest Mini or Echo Dot?
The HomePod Mini will start shipping mid-November, and we probably won't know much about its commercial success until Apple reports financials for its holiday quarter sometime in early 2021. However, investors of smart speaker maker Sonos apparently didn't think the device was much of a match for the competition: After briefly cratering during the Apple event, Sonos' stock price was up 5% following the HomePod Mini announcement.
"It is not clear to us that anything significant is going to change in the short term." Lightshed Partners analysts chime in on Disney's newly announced restructuring, which is supposed to put streaming front and center.
"We're still believers in the bundle." NBCUniversal's direct-to-consumer chairman Matt Strauss, explaining why the company didn't add feeds for any of its existing cable networks to its Peacock streaming service, at Variety's Entertainment & Technology Summit.
Today's online marketplaces gather millions of sellers, hundreds of millions of buyers, and generate billions of dollars in economic benefits. Specifically, the Connected Commerce Council (3C) research shows that the value marketplaces bring to small and medium-sized businesses exceeds $145 billion annually. Read more on why we should celebrate the benefits of digital tools and the businesses using them.
This startup wants to clone your voice
Still haven't gotten around to learning a new language during the pandemic? No worries, there's now a startup that can help you speak French, Spanish or Italian in just a few minutes. Toronto-based Resemble AI is launching a new service dubbed Localize this week that clones your voice to then produce translated audio recordings in a number of different languages.
Resemble has already cloned over 42,000 voices, CEO Zohaib Ahmed told me, and it counts major telcos, banks and call center operators as its customers. The company's work was inspired by the growth of voice apps for smart speakers, which often use the same standard voices, confusing users who don't understand whether they're talking to Google, Alexa or the voice app of a media brand. "Everyone is making these voice applications, and they all sound exactly the same," he said.
Resemble's solution? Start with a custom voice.
- The company has built a self-serve system, allowing anyone to capture and clone their voice on its website. Resemble's demo requires users to record as little as 50 sentences. Ahmed said that the company typically records some 15 minutes of audio for a voice bot, while more expressive performances can require additional source material.
- Resemble doesn't need customers to record rolling R's to translate their voice to Spanish. Instead, it taps into hundreds of hours of existing modeling data per language.
- For now, Resemble's Localize service is limited to Indo-European languages like French, German, Dutch, Italian and Spanish, in part because Asian languages often have very different intonation. Ahmed told me that the company is looking to tackle additional languages and markets next.
Voice cloning isn't new. Google teamed up with John Legend, Issa Rae and other celebrities to personalize its smart speaker assistant, using AI to have them read your local weather forecast. Ahmed said that his company had built a lot of custom voice tech, including a dedicated vocoder, to beat Google's algorithms. He admitted that the company's tech may not be ready to dub entire movies just yet, but said that Resemble has had conversations with video game companies about auto-translating voices of minor characters for multiple markets.
The democratization of this and other forms of deepfake technology is fascinating, and a bit scary. If political campaigns already use out-of-context quotes in their commercials, imagine what they could do with cloned voices?
Ahmed admitted that there is potential for abuse, and said that his company had been open-sourcing a lot of its tech to make it easier for researchers to develop audio fingerprinting and other authentication antidotes. He also contended that consumers already regard a lot of advertising as fake; for instance, they recognize that a glistening Whopper in a Burger King spot is not quite the real thing. Ultimately, we all have to become more critical about assessing the credibility of any recording, be it video or audio, Ahmed said. "A lot of it is going to be a mental shift."
The Roku Channel app went live on Fire TV devices.Roku is taking another step to take its ad-supported video service beyond its own device footprint.
On Protocol:How Netflix reinvented its social media marketing. The streaming service has launched a number of social media accounts targeting LGBTQ+, Black and Latinx viewers.
Amazon's new app turns its packaging into AR experiences. Amazon boxes now come with QR codes that will unlock a number of AR experiences.
No stimulus checks are bad news for the pay TV industry. The continuing economic crisis, and the lack of federal help, will lead to more cord cutting.
Quest 2 is getting high marks. Facebook's new VR headset began shipping Tuesday, and reviewers are liking it: "Feels too good to be true." (CNET) "The quintessential VR experience." (Esquire) "The New King of VR." (UploadVR)
Apple's new iPhone 12 Pro has a lidar scanner. The depth sensor is being used by Snap and others to build better AR experiences.
AMC's financial troubles are deepening. The theater chain warned this week that it may go broke before the end of the year.
On Protocol:Zoom plans to dominate the virtual events industry with a new events platform called OnZoom. Will we ever be off Zoom again?
Speaking of smart speakers: I've got a few of them at my house, and appreciate them for the imperfect pieces of technology they are. Mostly useful, occasionally buggy, sometimes hilariously wrong. Like the time my daughter asked her speaker to spell elephant in Spanish, and Google answered: "E L E P H A N T I N S P A N I S H."
However, that's nothing compared to what's been happening to Reddit user Nesede, who had this to report this week: "Every time I say 'hey Google, play some music,' it plays the Spotify playlist called 'Relaxing Piano Music FOR CATS.'" Nesede added that he never owned a cat in his life and was understandably confused.
Leave it to the Reddit community to make him feel better: "Maybe you're saying it like 'meow-sic,'" one commenter suggested. "Try saying Moo-sic to get Relaxing Piano music for Cows," another replied. "Clearly cats have hacked your Google account," was another good guess. Still, nothing comes close to this piece of advice: "I mean, maybe you should listen to it. It's 2020, worse recommendations have been made."
Thanks for reading — see you next week!