Twitter in May said it would begin prompting users who are about to tweet something nasty to either revise or delete the message before sending. The decision, the company said at the time, was based on a successful test of the messages in the run-up to the 2020 election.
Now, a new study — this one from researchers at New York University — adds to the evidence that giving users warnings about hate speech can actually cut down their use of hate speech by 10-20%. And those warnings can change users' behavior even when users aren't in the heat of the moment and about to tweet something regrettable.
The researchers at NYU's Center for Social Media and Politics developed their experiment last summer, in response to what was beginning to look like a mass migration of Twitter users to more extreme platforms like Parler. "We wanted to find a way that would basically prevent them from migrating to these platforms, but at the same time, that would result in the reduction of hate speech," said Mustafa Mikdat Yildirim, a PhD. candidate in NYU's department of politics and the lead researcher on the report.
So, last July, as racial justice protests were swelling, anti-Asian sentiment was filling social media and conservatives like Sen. Ted Cruz were threatening to abandon Twitter, the NYU researchers began monitoring a subset of 600,000 tweets and scanning for users who they thought might soon be suspended for hate speech. Eventually, the researchers whittled their list down to users who did get suspended and also met certain other criteria, including having more than 50 followers and having at least 7 followers who have also used hateful language in their tweets.
Then the researchers trained their attention on the people who followed those suspended accounts. They wanted to know whether warning these people that someone they followed had been suspended for hate speech — and that they could be next — would change the way those people behaved.
The researchers ended up with a list of 27 suspended users with 4,327 followers among them, and divided the followers up into six experimental groups and one control. The researchers then set up their own Twitter accounts with names like @hate_suspension and @expert_on_hate and began publicly tweeting directly at the users in all six groups with one of six different warning messages. They wanted to see which approach, if any, was most effective.
Two of the groups got messages designed to remind people of what they could lose if they used hate speech. Another two received tweets that emphasized "legitimacy," which more or less meant respectfulness. The last two groups got messages that framed the sender as an expert to lend credibility to the message. The messages came in two different flavors — high intensity and low intensity. The control group, meanwhile, received no warning at all.
Image: NYU Center for Social Media and Politics
The researchers found that just one warning reduced the use of hateful language by 10% a week after the experiment. For the most effective message — which was also the most politely worded — the change was more like 15-20% a week later.
The fact that all of the messages had similar degrees of impact suggested to the researchers that simply receiving a warning may have had more of an impact than what the particular warning said. "Knowing that someone else sees their hate speech [...] may make people think once more about the language that they used," Yildirim said.
The NYU researchers' findings build on Twitter's own results from last year's experiment. The company found that when users were prompted to revise or delete a harmful tweet before sending it, a whopping 34% of them actually did. And in the future, Twitter said, those users sent 11% fewer offensive replies than they'd sent before.
"Our teams are reviewing the report and its findings," a Twitter spokesperson said of the NYU research. "Broadly, over the past year, we've taken an iterative approach to our work, from encouraging people to more thoughtfully consider sharing content to taking several measures to slow down the spread of misinformation. We'll continue that iterative approach and look forward to building on our efforts with a variety of third-party partners on this critical work."
The NYU report suggests that an even more proactive intervention — warning users even when they're not on the cusp of saying something rotten — could have a significant effect too. And yet, the researchers aren't urging Twitter to adopt their method wholesale.
The NYU researchers didn't see evidence to detect that their warnings, which were coming from accounts with less than 100 followers, might prompt people to send even more hateful tweets out of spite. But they acknowledge that it might be different if Twitter was sending the message itself. "We don't really know whether people would actually come back at Twitter with some type of backlash," Yildirim said.
It would also be tricky for Twitter to automatically send these warnings to users who follow accounts that have been suspended. Some suspensions, after all, happen by mistake and then get reversed.
Yildirim said it will be important for Twitter to test this type of system itself and be transparent about its findings. The fact is, no one is quite as equipped as Twitter to implement such a widespread intervention on the platform. Civil society groups could take up the task, but they don't have all of Twitter's data or its technical resources at their disposal. Yildirim has been encouraged by Twitter's transparency with its experiments in creating healthier conversations so far.
Twitter is not the only company experimenting with warnings and other kinds of "friction." Facebook has also been ramping up its use of information labels and interstitials, but Facebook, Yildirim said, is far harder to study.
All of these companies are leaning into this strategy as a way to avoid having to take more drastic action, like removing content or suspending entire accounts. The NYU approach offers yet another option, using those suspensions as a cautionary example for the users who are left.