Audio is the white whale of social media. A TikTok- or Twitter-like platform for audio recordings sounds like a solid bet on paper. Audio is intimate and imaginative. The stakes are lower, and the costs more accessible, compared to recording video content. Best of all, social audio appears to be new and exciting — like it’s never been done before.
“Every couple of years, a new audio social media platform emerges, excites us through its novel approach, and briefly captures our collective attention,” said Michael Mignano, co-founder of podcast platform Anchor and a partner at VC firm Lightspeed. While the podcasting industry has grown substantially, with more than one-third of Americans listening to podcasts regularly, there so far hasn’t been a similar market for short-form audio.
No one has definitively cracked the code on how to entice people, on a large scale, to engage with audio-only content like they do videos or text, and any success has been short-lived. After capturing imaginations during the pandemic, live chat platform Clubhouse receded from the mainstream. Twitter pulled resources from its Clubhouse clone, Spaces, in June.
Meta shut down its short-form audio Soundbites and podcast hub in May. Startups like Shuffle, which billed itself as the TikTok for podcasts, have also shut down. Others like Snipd, an AI-based podcast app that lets users create and scroll through podcast snippets, have just started chasing the audio dream, convinced its take on social audio might have the right formula to finally take off.
Apple and Spotify, the preeminent podcasting platforms, are perhaps best positioned to experiment with social, shareable audio; Spotify has perhaps come the closest with its year-in-review Spotify Wrapped slideshows. Both declined to speak on the record, but pointed Protocol to blog posts about how they segment podcast episodes. Spotify acquired a company called Podz in 2021 that generates audio clips, hinting that the company may invest more in the social discovery aspect of audio. But its plans are unclear so far.
Will it ever be audio’s time to shine? It’s at a disadvantage in our short-attention-span economy filled with shiny images and the never-ending scroll, founders told Protocol. Listening is an inherently passive experience, making it more difficult to sell to investors and the average user.
“We’re doing something else when we’re listening to audio, right?” said Cliff Lampe, a University of Michigan professor specializing in digital communication. “That’s not true with either reading or video because it grabs more of our attention.”
Audio has a virality problem
“Is a hit machine for audio even possible? And is it something anyone even wants?” asks journalist Stan Alcorn at the top of his viral 2014 article, “Why Audio Never Goes Viral.”
Eight years later, the question remains. Audio is still more difficult to share on the internet, despite countless startups and platforms offering solutions to the problem. It’s not because companies haven’t tried hard enough — users just aren’t that interested.
“It competes poorly with video when it comes to user-generated enthusiasm, basically,” said Brian Lamb, co-founder of education tech company Swivl. “If you’re trying to get people to author original content, they’re more likely to want to just do video.” Swivl used to offer a user-generated audio platform called Synth that has since shut down.
Audio faces an uphill battle. For one, you can’t skim it easily. You have to listen to audio linearly, making it an inefficient mode of consumption. It also doesn’t require all your attention; many people listen to podcasts while running errands, working out, or cooking. “The reality is you can consume content so much more quickly and efficiently through your eyes than you can through your ears,” Mignano said.
If you’re trying to get people to author original content, they’re more likely to want to just do video.”
Because of this inherent barrier to listening, it’s harder to convince creators to invest in stellar content. Recording professional audio is hard enough at its core. Lampe also pointed out that it’s a rare skill to speak well, “especially using the cadence and emotion, reducing your number of ‘ums’ and hesitations.”
Chris Messina, social tech expert and inventor of the Twitter hashtag, said the cognitive cost for listeners creates a very high bar for audio content. “If I want someone to listen to what I have to say, I better be fucking, like, interesting,” Messina said.
Both Mignano of Anchor and Lamb of Synth started with the idea of an “audio Twitter” in which users could record original, short-form voice content. It’s not a bad idea; Twitter itself has audio origins with defunct podcast platform Odeo. But in Anchor’s case, Mignano quickly realized that while there was demand to create audio content, it hadn’t reached a critical mass. Plus, the content quality was poor without creative tools. Even a grainy, poorly recorded video can go viral. But choppy audio is far less engaging.
Once Anchor released creator tools, the social audio began to resemble podcasts. But who wants to visit Anchor, the new kid on the block, for podcast-like audio when you could get essentially every podcast from Apple or Spotify?
“We transitioned into being a really easy to use podcasting platform,” Mignano said. “With the tap of a button, instead of just publishing to your social graph inside of the Anchor app, we published it to Spotify and to Apple podcasts and all these places.” The model worked, and the switch ultimately led to a Spotify acquisition in 2019.
Lamb ran into the same problem with Synth. While the tech was sound, it didn’t have the content users wanted. Lamb pivoted from offering user-generated content to offering both manual and automatic podcast-snipping tools. Still, it didn’t resonate with consumers. Lamb shut down the consumer side of Synth a year and a half ago and officially shut the tool down for educators a month ago..
Kevin Smith, CEO of AI startup Snipd, said he’s not building a social audio app at the moment. Snipd lets users manually clip podcasts, automatically segments podcasts into highlights and chapters, and has a “for you” page full of clips with transcripts as the visual. Audio information can easily get lost in the big bad podcast universe. Smith wants to help listeners harness it.
“Our goal is to build an app that unlocks the knowledge in podcasts,” Smith said. “If the social aspect helps with that, then we don’t have a problem with it.”
He’s interested in tackling the barrier to discovering podcasts and making them easier to consume. Turning podcasts into bite-sized bits helps, but it’s also more social as well. It’s easier to share audio nuggets than the whole meal. Smith says Snipd appeals to a wide range of users, and he’s optimistic it will continue to grow. He credits the latest advancements in natural-language processing for making Snipd’s mission possible. Transcribing audio has become easier, as is segmenting it into core parts.
Snipd automatically segments podcasts into highlights and chapters, and lets users manually clip podcasts. Image: Snipd
But it’s especially struck a chord with the productivity community. For obsessive notetakers, being able to embed audio snippets into your Notion- or Obsidian-based second brain (productivity speak for note-taking system) is a win. “We didn’t expect that a certain percentage of our users would be so enthusiastic about plugging this into your second brain,” Smith said.
Finding a niche is powerful, and it can surround a product with passionate, dedicated users. But serving only a small subset of people can also hold products back from attaining mainstream popularity, the kind of success that VCs and media generally expect from ambitious startups. Synth, for instance, had a built-in user base of educators because of its parent company. It allowed teachers to incorporate student voices in class projects.
“In education, it’s a little bit different because someone’s requiring you to do it,” Lamb said. “There are very different motivators involved in capturing and creating that content.”
But Synth didn’t succeed when it came to the general consumer market. Shuffle faced a similar issue. Ada Yeo, co-founder and CEO of the now-defunct company, said the tool had early power users from tech Twitter. But the product was split between users who wanted the tool for note-taking purposes and general users who might have used it to share podcast clips on social media.
“We just couldn’t find a way to crack other communities,” Yeo said, referencing the worlds of comedy and sports podcasts, “that would lead to a more mainstream product.”
Podcasting itself is a niche form of entertainment compared to TV or movies. Anybody can create a podcast now, but in Messina’s words, this means a lot of the audio out there is “shit and probably should never have been recorded.” Excellent audio certainly exists, but blockbusters are rare. With a crowded audio market, there are fewer listeners to go around, making it harder for audio platforms to scale to the level of a social network. Maybe committing to the niche is the answer, Messina suggested.
“We evaluate [social audio] and define its success based on the size of success we’ve seen before,” he said, “as opposed to saying, ‘Actually, this can be a very healthy ecosystem, but it’s a small suburb of social media land.’”
Audio’s saving grace: AI or Spotify
If any type of audio platform were to grow into a proper social network, experts agree it would have to focus on short-form clips. This American Life rolled out a product called Shortcut in 2016 that was meant to “make podcasts as shareable as GIFs” (RIP GIFs, by the way). But it doesn’t appear to have caught on, and six years later, Shortcut is still in beta.
Smith says Snipd’s AI features may make the process of creating clips less time-intensive, while also making it more likely users see audio they like. Snipd’s AI discovery algorithm is far from perfect, but Smith said the team is working on improving it.
Messina said we have to be less precious about the way we consume podcasts. Allowing AI to chop podcasts into shareable bits makes audio easier to consume, and a social audio platform more viable. “In the future, it may become easier to remix, reshape, snip, and share those audio moments,” Messina said.
Apple is generating transcripts to help with search results, but they’re not publicly available to listeners. Spotify has transcripts only for its original shows so far. Messina thinks Spotify might be moving toward enabling social audio, allowing users to share response clips to podcasts.
Mignano, who left his role as Spotify’s head of talk in June, declined to speak about specific plans but emphasized the company’s ambitions to foster more podcast creators. “Spotify has been pretty public around its ambitions to be a platform for tens, if not hundreds of millions of creators,” Mignano said. “The company continues to do a lot of work to make it easier for nonprofessionals to make podcasts.”
“[S]ocial audio has come at the wrong time at the moment.”
While small startups may conceive of better recommendation engines, the popularization of social, short-form audio depends largely on major platforms and whether they decide to invest in supporting it. For companies like Apple or Spotify, experimenting with short-form audio makes sense. For the major social media platforms with visuals or text as the standard, audio feels more like just one feature among many.
“It certainly feels like something that’s untapped and ripe for experimentation, but social audio has come at the wrong time at the moment,” said social media expert Matt Navarra. “Twitter’s in disarray, Meta’s placing its bets on a very small number of things it thinks is going to work out.” In other words, audio is not on many companies’ list of priorities.
Mignano recently wrote about podcasts increasingly becoming more visual, with podcast creators releasing video segments on TikTok. Though Mignano believes in the beauty of audio, he doesn’t think it’s ready for social media prime time.
“Audio is just unique in the world of social,” Mignano said. “It’s incredibly rich, it’s intimate, it's immersive, but it has this disadvantage.”