Power

Stopping fake accounts is a cat-and-mouse game. Can Facebook win with AI?

The social network is in a constant power struggle with bad actors. Increasingly complex machine learning gives it an edge — for now.

A screenshot of Mark Zuckerberg's Facebook account

Even humans can have a hard time telling fake Facebook accounts from real ones. The company's newly detailed AI draws on a mountain of data to help it tell the difference.

Photo illustration: Rafael Henrique/SOPA Images/LightRocket via Getty Images

Bochra Gharbaoui pointed to four Facebook profiles on a screen in the social network's London headquarters and asked a seemingly simple, but fundamentally important, question: Which is fake?

I didn't know.

Something looked a little off about each of them. One used an overtly raunchy profile picture. A second appeared to be the account of a cat. A third described a toy dog. And another was oddly detailed but didn't have an image. But all those kinds of profiles exist among my Facebook acquaintances. Who was I to call them fake?

Get what matters in tech, in your inbox every morning. Sign up for Source Code.

"You basically just told us what all of our research suggests," said Gharbaoui, a data science manager on Facebook's community integrity team. "Which is that when people say fake, they often mean suspicious."

Now, I don't like to brag, but … I am a human. And humans tend to be quite a lot better than computers at complex reasoning and dealing with ambiguities like the ones raised by these possibly fake accounts. So if I struggled to answer Gharbaoui's question, one starts to understand why the algorithms Facebook has pointed at the problem might, too. (By the way, all of the accounts were fictive, but each could have set off alarms for different reasons. See? I told you it was difficult.)

Against that backdrop of uncertainty, over the past few years, the company has developed a new machine learning system that it calls Deep Entity Classification for detecting convincing fake accounts that make it on to the platform. The algorithm studies 20,000 features belonging to the acquaintances of each account it considers to establish whether it's genuine or not. That's something that no human could ever do, and the system has already been used to take down hundreds of millions of accounts that violate the company's terms of service.

The question: Is it enough?

There's fake accounts. Then there's fake accounts.

The threat of fake accounts on social media platforms is real. They "can be used for so much bad or evil," said Max Heinemeyer, director of threat hunting at Darktrace, which specializes in machine learning approaches for cybersecurity. That could be generating spam, running scams, inciting violence, organizing terrorism, or other behavior that is generally considered to be deeply problematic.

But for a company like Facebook, every decision it makes to disable an account is high-stakes. Getting it wrong "essentially means that we are denying people access to this platform," Gharbaoui said, so it has invested in several layers of analysis to root out problem accounts.

In the first instance, it blocks millions of attempts to create accounts every day, says Gharbaoui, using a machine learning model that is designed to very quickly process a high volume of information and make rapid decisions. Facebook won't describe the precise features that could lead to blocked signups, arguing that to do so would provide bad actors with too much information, but factors like the IP address of the request and the volume from that location are the kinds of information that are likely to be considered, among others.

Meanwhile, many more accounts — in fact, what Gharbaoui describes as the "vast majority" of the 1.7 billion that were disabled in the third quarter of last year — are also caught by fast, high-volume machine learning algorithms before the accounts had broad access to the platform. Again, Facebook won't describe what leads to such disablement, but it could be, say, a pattern of initial behavior that has been repeated by many thousands of other accounts in the past — a telltale sign that an account is controlled by a bot.

Even with those protections in place, though, many accounts still sneak through. And perhaps I shouldn't feel too bad about my own ineptitude, because the quality of fake accounts on social media platforms has improved dramatically in recent years.

Today's more advanced approaches to fake account creation use machine learning to generate increasingly realistic profiles, said Shuman Ghosemajumder, global head of artificial intelligence systems at F5 Networks. They are able to create convincing-sounding names and biographies, and even entirely synthetic images that are almost impossible to discern from genuine photographs of real humans.

This situation is born out of necessity on the part of bad actors, according to Heinemeyer: If a bad actor's business model is based on creating fake accounts to, for example, scam people, they're damn sure going to try to learn how to beat the systems that block their fake accounts by creating increasingly realistic spoofs. It makes the situation harder to deal with.

"Where Facebook has a great advantage is knowing what organic activity looks like in its social graph," Ghosemajumder said.

20,000 features under the hood

The social network has tapped that knowledge to build Deep Entity Classification, the machine learning model that it claims has helped it make a big advance in how many of those convincing fake accounts it can root out.

Instead of studying direct properties of an account, like its name or how many friends it has — attributes that the user has direct control over — DEC studies what Facebook calls "deep features." These are properties of the users, groups, pages and other entities that the account is linked to, which it is much harder, if not impossible, for the user to directly control. And it looks not just at those entities, but also at the ones that are another branch along the social graph — it stops there in order to limit the computational overheads of its model. Still, that creates a bewildering number of features that are available to study. Currently, 20,000 are used for DEC's decision-making.

The system is then trained on a data set composed of accounts that have been labeled as fake or real in the past. Unlike most machine learning algorithms, though, it uses two pools of data: high-precision human labels ascribed by security experts, along with much larger amounts of lower-precision automated labels created by other algorithms used by the company. Facebook says that the new algorithm is first roughly trained using millions of examples of lower-precision data, before being fine-tuned by hundreds of thousands of examples of the higher-precision data.

The model is also frequently retrained from data gathered from across the social network, allowing a new version to "ship many times a day," said Daniel Bernhardt, an engineering manager on the company's community integrity team.

How's it working out? So far, DEC has been responsible for the identification and deactivation of over 900 million fake accounts over the past two years, according to Facebook.

A cat-and-mouse game

The levels of nuance and complexity provided by complex machine learning models like this "significantly raise the bar" that bad actors must pass to continue using fake accounts, Ghosemajumder said. But the bar is not raised to impossible heights — and bad actors can always learn to jump higher.

"It will always be a cat-and-mouse game," said Zubair Shafiq, an assistant professor in the department of computer science at the University of Iowa. That's because "you have an active attacker, who changes its behavior."

It's not that bad actors are necessarily able to reverse-engineer a system like the one Facebook has developed. Instead, it's a process of trial-and-error. "They will tweak their approach on intuition," Shafiq said. "And then after five or 10 tries, something might work."

Facebook's Bernhardt likens this to the way a biological virus mutates. "All the virus needs is like one or two mutations in order to make it past an existing defense system," he said. So it's Facebook's job to put enough defenses in place that even those extra mutations don't allow bad actors to fool its systems.

Security experts disagree on whether they think it's possible to keep those defenses improving beyond the capabilities of bad actors in the future.

"You find yourself in a war of algorithms," Heinemeyer said. As machine learning becomes more ubiquitous, he argued, it will be harder for companies to rely on their in-house expertise to keep ahead.

Get in touch with us: Share information securely with Protocol via encrypted Signal or WhatsApp message, at 415-214-4715 or through our anonymous SecureDrop.

But Ghosemajumder likens the situation of fake accounts on social media platforms to that of spam email. It will never be a solved problem, but it could be solved enough to live with. "Most people don't feel the effect of spam now in the same way they did 15 years ago," he said. "I think we have the technology to be able to get ahead of this problem," he added. "It's really just about making the right investments and performing the right R&D."

For its part, Facebook knows this isn't a problem that it's going to solve and move on from. "We will see quite fast and quite, you know, robust reactions" from bad actors every time its fake account defenses are upgraded, said Facebook's Bernhardt. "That's what the team basically comes into work on every day."

Fintech

Gavin Newsom shows crypto some California love

“A more flexible approach is needed,” Gov. Newsom said in rejecting a bill that would require crypto companies to get a state license.

Strong bipartisan support wasn’t enough to convince Newsom that requiring crypto companies to register with the state’s Department of Financial Protection and Innovation is the smart path for California.

Photo: Jerod Harris/Getty Images for Vox Media

The Digital Financial Assets Law seemed like a legislative slam dunk in California for critics of the crypto industry.

But strong bipartisan support — it passed 71-0 in the state assembly and 31-6 in the Senate — wasn’t enough to convince Gov. Gavin Newsom that requiring crypto companies to register with the state’s Department of Financial Protection and Innovation is the smart path for California.

Keep Reading Show less
Benjamin Pimentel

Benjamin Pimentel ( @benpimentel) covers crypto and fintech from San Francisco. He has reported on many of the biggest tech stories over the past 20 years for the San Francisco Chronicle, Dow Jones MarketWatch and Business Insider, from the dot-com crash, the rise of cloud computing, social networking and AI to the impact of the Great Recession and the COVID crisis on Silicon Valley and beyond. He can be reached at bpimentel@protocol.com or via Google Voice at (925) 307-9342.

Sponsored Content

Great products are built on strong patents

Experts say robust intellectual property protection is essential to ensure the long-term R&D required to innovate and maintain America's technology leadership.

Every great tech product that you rely on each day, from the smartphone in your pocket to your music streaming service and navigational system in the car, shares one important thing: part of its innovative design is protected by intellectual property (IP) laws.

From 5G to artificial intelligence, IP protection offers a powerful incentive for researchers to create ground-breaking products, and governmental leaders say its protection is an essential part of maintaining US technology leadership. To quote Secretary of Commerce Gina Raimondo: "intellectual property protection is vital for American innovation and entrepreneurship.”

Keep Reading Show less
James Daly
James Daly has a deep knowledge of creating brand voice identity, including understanding various audiences and targeting messaging accordingly. He enjoys commissioning, editing, writing, and business development, particularly in launching new ventures and building passionate audiences. Daly has led teams large and small to multiple awards and quantifiable success through a strategy built on teamwork, passion, fact-checking, intelligence, analytics, and audience growth while meeting budget goals and production deadlines in fast-paced environments. Daly is the Editorial Director of 2030 Media and a contributor at Wired.
Workplace

Slack’s rallying cry at Dreamforce: No more meetings

It’s not all cartoon bears and therapy pigs — work conferences are a good place to talk about the future of work.

“We want people to be able to work in whatever way works for them with flexible schedules, in meetings and out of meetings,” Slack chief product officer Tamar Yehoshua told Protocol at Dreamforce 2022.

Photo: Marlena Sloss/Bloomberg via Getty Images

Dreamforce is primarily Salesforce’s show. But Slack wasn’t to be left out, especially as the primary connector between Salesforce and the mainstream working world.

The average knowledge worker spends more time using a communication tool like Slack than a CRM like Salesforce, positioning it as the best Salesforce product to concern itself with the future of work. In between meeting a therapy pig and meditating by the Dreamforce waterfall, Protocol sat down with several Slack execs and conference-goers to chat about the shifting future.

Keep Reading Show less
Lizzy Lawrence

Lizzy Lawrence ( @LizzyLaw_) is a reporter at Protocol, covering tools and productivity in the workplace. She's a recent graduate of the University of Michigan, where she studied sociology and international studies. She served as editor in chief of The Michigan Daily, her school's independent newspaper. She's based in D.C., and can be reached at llawrence@protocol.com.

LA is a growing tech hub. But not everyone may fit.

LA has a housing crisis similar to Silicon Valley’s. And single-family-zoning laws are mostly to blame.

As the number of tech companies in the region grows, so does the number of tech workers, whose high salaries put them at an advantage in both LA's renting and buying markets.

Photo: Nat Rubio-Licht/Protocol

LA’s tech scene is on the rise. The number of unicorn companies in Los Angeles is growing, and the city has become the third-largest startup ecosystem nationally behind the Bay Area and New York with more than 4,000 VC-backed startups in industries ranging from aerospace to creators. As the number of tech companies in the region grows, so does the number of tech workers. The city is quickly becoming more and more like Silicon Valley — a new startup and a dozen tech workers on every corner and companies like Google, Netflix, and Twitter setting up offices there.

But with growth comes growing pains. Los Angeles, especially the burgeoning Silicon Beach area — which includes Santa Monica, Venice, and Marina del Rey — shares something in common with its namesake Silicon Valley: a severe lack of housing.

Keep Reading Show less
Nat Rubio-Licht

Nat Rubio-Licht is a Los Angeles-based news writer at Protocol. They graduated from Syracuse University with a degree in newspaper and online journalism in May 2020. Prior to joining the team, they worked at the Los Angeles Business Journal as a technology and aerospace reporter.

Policy

SFPD can now surveil a private camera network funded by Ripple chair

The San Francisco Board of Supervisors approved a policy that the ACLU and EFF argue will further criminalize marginalized groups.

SFPD will be able to temporarily tap into private surveillance networks in certain circumstances.

Photo: Justin Sullivan/Getty Images

Ripple chairman and co-founder Chris Larsen has been funding a network of security cameras throughout San Francisco for a decade. Now, the city has given its police department the green light to monitor the feeds from those cameras — and any other private surveillance devices in the city — in real time, whether or not a crime has been committed.

This week, San Francisco’s Board of Supervisors approved a controversial plan to allow SFPD to temporarily tap into private surveillance networks during life-threatening emergencies, large events, and in the course of criminal investigations, including investigations of misdemeanors. The decision came despite fervent opposition from groups, including the ACLU of Northern California and the Electronic Frontier Foundation, which say the police department’s new authority will be misused against protesters and marginalized groups in a city that has been a bastion for both.

Keep Reading Show less
Issie Lapowsky

Issie Lapowsky ( @issielapowsky) is Protocol's chief correspondent, covering the intersection of technology, politics, and national affairs. She also oversees Protocol's fellowship program. Previously, she was a senior writer at Wired, where she covered the 2016 election and the Facebook beat in its aftermath. Prior to that, Issie worked as a staff writer for Inc. magazine, writing about small business and entrepreneurship. She has also worked as an on-air contributor for CBS News and taught a graduate-level course at New York University's Center for Publishing on how tech giants have affected publishing.

Latest Stories
Bulletins