Enterprise

AI needs massive data sets to work. Meta is testing a way to do more with less.

Despite the constant deluge of content flowing into Facebook and Instagram, Meta has struggled to get enough data to train AI to spot harmful content, so it’s banking on an emerging approach.

A visualization of the few-shot learning AI process

Meta plans to announce that few-shot learning shows promise in its constant battle to weed out disinformation or other content that violates its policies on Facebook and Instagram.

Image: Meta

After a terrorist attack on a mosque in Christchurch, New Zealand was livestreamed on Facebook in 2019, its parent company, now called Meta, outfitted London police officers with body cams while they conducted terrorism training. At the time, Meta said there wasn’t enough video data to train its artificial intelligence systems to detect and remove violent content, so it hoped the body cam project would produce more of that scarce AI training data.

A year prior to that horrific incident, the company acknowledged that it failed to keep up with inflammatory posts from extremist groups in Myanmar. Again, it said the problem was a lack of data — there wasn’t enough content in Burmese to train algorithmic moderation systems to spot more of it.

They weren’t wrong: Despite the constant deluge of content flowing into Facebook and Instagram, traditional AI approaches used by Meta and other companies need enough examples of the bad stuff to recognize it when it shows up again. A dearth of training data can plague AI systems that need large amounts of information labeled by humans in order to learn.

Enter few-shot learning, a concept that researchers across the globe have experimented with in recent years. Few-shot learning models can be trained from generic data supplemented with just a “few” pieces of labeled content.

Now, Meta plans to announce Wednesday that few-shot learning shows promise in its constant battle to weed out disinformation or other content that violates its policies on Facebook and Instagram, particularly when there isn’t enough AI training data, such as in the case of emerging subject areas or breaking news events.

Following early tests on Facebook and Instagram, the company told Protocol that the technique has helped reduce the prevalence of content such as hate speech. So far, it has only used the approach to tackle a few content areas such as “misleading or sensationalized information that likely discourages COVID-19 vaccinations, and hostile speech like bullying and harassment and violence and incitement,” said a Meta AI spokesperson. For instance, the company tested few-shot learning to identify content that promoted the debunked notion that COVID-19 vaccines change people’s DNA.

Meta said the few-shot process shortens the amount of time it takes to train up an AI system from several months to a few weeks. “Since it scales quickly, the time from policy framing to enforcement would shorten by orders of magnitude,” wrote Meta in a blog post published Wednesday. In addition to text and image content, the company said the technique also works for video content by consuming audio transcript, text on video and video embedding.

The company aims to burnish its image amid endless scrutiny by lawmakers and everyday people of its handling of abusive and false content on Facebook and Instagram. Later on Wednesday Adam Mosseri, the head of Instagram, will answer inquiries from Senate Commerce Consumer Protection subcommittee members about how its algorithmic systems fuel content that has negative effects on kids.

Google, Baidu and others research few-shot approaches

Historically, artificial intelligence and machine learning algorithms have needed vast amounts of data to train them. Feed an algorithm lots of images of bananas or AK-47s labeled as such, and it will learn to recognize them — or at least that’s the goal.

Researchers from Open AI, Google, Baidu and academic institutions across the globe have studied few-shot learning in recent years to circumvent the need for massive datasets, and not just for removing harmful social media content. Researchers have suggested few-shot learning can be used to help discover molecular properties for drug development when data is restricted by privacy rules, or to uncover tweets related to natural disasters in the hopes of disseminating important safety information.

“Because large, labeled datasets are often unavailable for tasks of interest, solving this problem would enable, for example, quick customization of models to individual user’s needs, democratizing the use of machine learning,” wrote Google AI researchers in 2020 in a company blog post about few-shot learning.

Meta has been working on this AI problem for some time. It revealed some detail four years ago about how its AI tried to detect harmful content associated with terrorism, for example.

“When someone tries to upload a terrorist photo or video, our systems look for whether the image matches a known terrorism photo or video,” said the company at the time. To automatically remove text-based content, the company said, “we’re currently experimenting with analyzing text that we’ve already removed for praising or supporting terrorist organizations such as ISIS and Al Qaeda so we can develop text-based signals that such content may be terrorist propaganda. That analysis goes into an algorithm that is in the early stages of learning how to detect similar posts. The machine learning algorithms work on a feedback loop and get better over time.”

A track record of language failures

That was then. In November, Meta pointed to a series of technical milestones that led researchers to what it called “breakthrough” exploration of applying few-shot learning to content moderation. In a blog post last month, the company showed a timeline of advancements including a process called XLM-R that trains a model in one language and then applies it to content in other languages without the need for additional training data.

The company seems to be confident that emerging AI techniques like few-shot learning and XLM-R will help to improve how it patrols content in languages where it’s faltered before, such as “low-resource languages” like Burmese.

Yet the recently leaked Facebook Papers revealed Meta’s struggles to remove harmful content in places where it hasn’t hired enough human moderators or built well-trained moderation algorithms. Meta itself has admitted publicly that its automated moderation technologies have not worked well to weed out unwanted Burmese content in Myanmar, for example. But the exposed documents also showed the company did not develop algorithms to detect hate speech in Hindi and Bengali, both among the top-ten most-spoken languages in the world.

When asked by Protocol why it believes few-shot learning works in so many languages despite past failures, the Meta spokesperson said the system was trained on more than 100 languages and incorporates techniques like XLM-R. “The nuance and semantics of language is one of the reasons why we built this technology — to be able to more quickly address content in multiple languages,” said the spokesperson. “As these underlying language and text encoders improve, Meta AI FSL will also bring the improvements to additional languages too.”

Still, a lot of testing will be required to know if these emerging approaches can work at scale.

“We are early in the use of this technology,” said the Meta spokesperson. “As we continue to mature the tech and test it across various enforcement mechanisms and problems the goal is to further increase its use and continued accuracy.”

Entertainment

The (gaming) clones never stopped attacking

Clones keep getting through app review despite App Store rules about copying. It's a sign of the weaknesses in mobile app stores — and the weakness in Big Tech’s after-the-fact moderation approach.

Clones aren't always illegal, but they are widely despised.

Image: Disney

Two of the most fundamental tenets of the mobile gaming market:

  1. Free always wins.
  2. No good gaming idea is safe from copycats.

In combination, these two rules help produce what the industry calls a clone. Most often, clones are low-effort, ripped-off versions of popular games that monetize in not-so-savory fashion while drawing in players with a price tag of zero.

Keep Reading Show less
Nick Statt
Nick Statt is Protocol's video game reporter. Prior to joining Protocol, he was news editor at The Verge covering the gaming industry, mobile apps and antitrust out of San Francisco, in addition to managing coverage of Silicon Valley tech giants and startups. He now resides in Rochester, New York, home of the garbage plate and, completely coincidentally, the World Video Game Hall of Fame. He can be reached at nstatt@protocol.com.
Sponsored Content

A CCO’s viewpoint on top enterprise priorities in 2022

The 2022 non-predictions guide to what your enterprise is working on starting this week

As Honeywell’s global chief commercial officer, I am privileged to have the vantage point of seeing the demands, challenges and dynamics that customers across the many sectors we cater to are experiencing and sharing.

This past year has brought upon all businesses and enterprises an unparalleled change and challenge. This was the case at Honeywell, for example, a company with a legacy in innovation and technology for over a century. When I joined the company just months before the pandemic hit we were already in the midst of an intense transformation under the leadership of CEO Darius Adamczyk. This transformation spanned our portfolio and business units. We were already actively working on products and solutions in advanced phases of rollouts that the world has shown a need and demand for pre-pandemic. Those included solutions in edge intelligence, remote operations, quantum computing, warehouse automation, building technologies, safety and health monitoring and of course ESG and climate tech which was based on our exceptional success over the previous decade.

Keep Reading Show less
Jeff Kimbell
Jeff Kimbell is Senior Vice President and Chief Commercial Officer at Honeywell. In this role, he has broad responsibilities to drive organic growth by enhancing global sales and marketing capabilities. Jeff has nearly three decades of leadership experience. Prior to joining Honeywell in 2019, Jeff served as a Partner in the Transformation Practice at McKinsey & Company, where he worked with companies facing operational and financial challenges and undergoing “good to great” transformations. Before that, he was an Operating Partner at Silver Lake Partners, a global leader in technology and held a similar position at Cerberus Capital LP. Jeff started his career as a Manufacturing Team Manager and Engineering Project Manager at Procter & Gamble before becoming a strategy consultant at Bain & Company and holding executive roles at Dell EMC and Transamerica Corporation. Jeff earned a B.S. in electrical engineering at Kansas State University and an M.B.A. at Dartmouth College.
Entertainment

Beat Saber, Bored Apes and more: What to do this weekend

Don't know what to do this weekend? We've got you covered.

Images: Ross Belot/Flickr; IGBD; BAYC

This week we’re listening to “Harvest Moon” on repeat; burning some calories playing Beat Saber; and learning all about the artist behind the goofy ape pics that everyone (including Gwyneth Paltrow?) is talking about.

Neil Young: Off Spotify? No problem.

Neil Young removed his music from Spotify this week, but countless recordings are still available on YouTube, including this 1971 video of him performing “Heart of Gold” in front of a live studio audience, complete with some charming impromptu banter. And while you’re there, scroll down and read a few of the top-rated comments. I promise you won’t be disappointed.

'Archive 81': Not based on a book, but on a podcast!

Netflix’s latest hit show is a supernatural mystery horror mini-series, and I have to admit that I was on the fence about it many times, in part because the plot just often didn’t add up. But then the main character, Dan the film buff and archivist, would put on his gloves, get in the zone, and meticulously restore a severely damaged, decades old video tape, and proceed to look for some meaning beyond the images. That ritual, and the sentiment that we produce, consume and collect media for something more than meets the eye, ultimately saved the show, despite some shortcomings.

'Secrets of Sulphur Springs': Season 2 is out now

If you’re looking for a mystery that's a little more family-friendly, give this show about a haunted hotel, time travel, and kids growing up in a world that their parents don’t fully understand a try. Season 2 dropped on Disney+ this month, and it not only includes a lot more time travel mysteries, but even uses the show’s time machine to tackle subjects as serious as reparations.

The artist behind those Bored Apes

Remember how NFTs are supposed to generate royalties with every resale, and thus support artists better than any of their existing revenue streams? Seneca, the artist who was instrumental in creating those iconic apes for the Bored Ape Yacht Club, wasn’t able to share details about her compensation in this Rolling Stone profile, but it sure sounds like she is not getting her fair share.

Beat Saber: Update incoming

Years later, Beat Saber remains my favorite VR game, which is why I was very excited to see a teaser video for cascading blocks, which could be arriving any day now. Time to bust out the Quest for some practice time this weekend!

Correction: Story has been updated to correct the spelling of Gwyneth Paltrow's name. This story was updated Jan. 28, 2022.


Janko Roettgers

Janko Roettgers (@jank0) is a senior reporter at Protocol, reporting on the shifting power dynamics between tech, media, and entertainment, including the impact of new technologies. Previously, Janko was Variety's first-ever technology writer in San Francisco, where he covered big tech and emerging technologies. He has reported for Gigaom, Frankfurter Rundschau, Berliner Zeitung, and ORF, among others. He has written three books on consumer cord-cutting and online music and co-edited an anthology on internet subcultures. He lives with his family in Oakland.

Boost 2

Can Matt Mullenweg save the internet?

He's turning Automattic into a different kind of tech giant. But can he take on the trillion-dollar walled gardens and give the internet back to the people?

Matt Mullenweg, CEO of Automattic and founder of WordPress, poses for Protocol at his home in Houston, Texas.
Photo: Arturo Olmos for Protocol

In the early days of the pandemic, Matt Mullenweg didn't move to a compound in Hawaii, bug out to a bunker in New Zealand or head to Miami and start shilling for crypto. No, in the early days of the pandemic, Mullenweg bought an RV. He drove it all over the country, bouncing between Houston and San Francisco and Jackson Hole with plenty of stops in national parks. In between, he started doing some tinkering.

The tinkering is a part-time gig: Most of Mullenweg’s time is spent as CEO of Automattic, one of the web’s largest platforms. It’s best known as the company that runs WordPress.com, the hosted version of the blogging platform that powers about 43% of the websites on the internet. Since WordPress is open-source software, no company technically owns it, but Automattic provides tools and services and oversees most of the WordPress-powered internet. It’s also the owner of the booming ecommerce platform WooCommerce, Day One, the analytics tool Parse.ly and the podcast app Pocket Casts. Oh, and Tumblr. And Simplenote. And many others. That makes Mullenweg one of the most powerful CEOs in tech, and one of the most important voices in the debate over the future of the internet.

Keep Reading Show less
David Pierce

David Pierce ( @pierce) is Protocol's editorial director. Prior to joining Protocol, he was a columnist at The Wall Street Journal, a senior writer with Wired, and deputy editor at The Verge. He owns all the phones.

Workplace

Mental health at work is still taboo. Here's how to make it easier.

Tech leaders, HR experts and organizational psychologists share tips for how to destigmatize mental health at work.

How to de-stigmatize mental health at work, according to experts.

Illustration: Christopher T. Fong/Protocol

When the pandemic started, HR software startup Phenom knew that its employees were going to need mental health support. So it started offering a meditation program, as well as a counselor available for therapy sessions.

To Chief People Officer Brad Goldoor’s surprise, utilization of these benefits was very low, starting at about a 10% take rate and eventually weaning off. His diagnosis: People still aren’t fully comfortable opening up about mental health, and they’re especially not comfortable engaging with their employer on the topic.

Keep Reading Show less
Michelle Ma

Michelle Ma (@himichellema) is a reporter at Protocol, where she writes about management, leadership and workplace issues in tech. Previously, she was a news editor of live journalism and special coverage for The Wall Street Journal. Prior to that, she worked as a staff writer at Wirecutter. She can be reached at mma@protocol.com.

Fintech

Robinhood's regulatory troubles are just the tip of the iceberg

It’s easiest to blame Robinhood’s troubles on regulatory fallout, but its those troubles have obscured the larger issue: The company lacks an enduring competitive edge.

A crypto comeback might go a long way to help Robinhood’s revenue

Image: Olena Panasovska / Alex Muravev / Protocol

It’s been a full year since Robinhood weathered the memestock storm, and the company is now in much worse shape than many of us would have guessed back in January 2021. After announcing its Q4 earnings last night, Robinhood’s stock plunged into the single digits — just below $10 — down from a recent high of $70 in August 2021. That means Robinhood’s valuation dropped more than 84% in less than six months.

Investor confidence won’t be bolstered much by yesterday’s earnings results. Total net revenues dropped to $363 million from $365 million in the preceding quarter. In the quarter before that, Robinhood reported a much better $565 million in net revenue. Net losses were bad but not quite as bad as before: Robinhood reported a $423 million net loss in Q4, an improvement from the $1.3 billion net loss in Q3 2021. One of the most shocking data points: Average revenue per user dropped to $64, down from a recent high of $137 in Q1 2021. At the same time, Robinhood actually reported a decrease in monthly active users, from 18.9 million in Q3 2021 to 17.3 million in Q4 2021.

Keep Reading Show less
Hirsh Chitkara

Hirsh Chitkara ( @HirshChitkara) is a is a reporter at Protocol focused on the intersection of politics, technology and society. Before joining Protocol, he helped write a daily newsletter at Insider that covered all things Big Tech. He's based in New York and can be reached at hchitkara@protocol.com.

Latest Stories
Bulletins