Policy

Lawmakers want humans to check runaway AI. Research shows they’re not up to the job.

Policymakers want people to oversee — and override — biased AI. But research suggests there's no evidence to prove humans are up to the task.

Closeup of lights reflected in a person's eye

The recent trend toward requiring human oversight of automated decision-making systems runs counter to mounting research about humans' inability to effectively override AI tools.

Photo: Jackal Pan/Getty Images

There was a time, not long ago, when a certain brand of technocrat could argue with a straight face that algorithms are less biased decision-makers than human beings — and not be laughed out of the room. That time has come and gone, as the perils of AI bias have entered mainstream awareness.

Awareness of bias hasn't stopped institutions from deploying algorithms to make life-altering decisions about, say, people's prison sentences or their health care coverage. But the fear of runaway AI has led to a spate of laws and policy guidance requiring or recommending that these systems have some sort of human oversight, so machines aren't making the final call all on their own. The problem is: These laws almost never stop to ask whether human beings are actually up to the job.

"These assumptions about human oversight are playing a really critical role in justifying the use of these tools," said Ben Green, a postdoctoral scholar at the University of Michigan and an assistant professor at the Gerald R. Ford School of Public Policy. "If it doesn't work, then we're failing to get any of the protections that are seen as essential for making the system acceptable to us at all."

In a new paper, Green, who has extensively studied the use of algorithms in parole and sentencing decisions, demonstrates how the recent trend toward requiring human oversight of automated decision-making systems runs counter to mounting research about humans' inability to effectively override AI tools.

"The point is not to say: Let's just allow these algorithms to be used without the human oversight," Green said. "But if we're only comfortable with these algorithms because we have human oversight, we actually shouldn't be comfortable with these algorithms at all, because the human oversight doesn't work."

This interview has been lightly edited and condensed.

What got you thinking about this issue to begin with?

For the last several years, I've been doing experimental technical work, studying how people interact with algorithms when making predictions and decisions. A good chunk of the empirical findings that I'm drawing on in the paper are this research that I've conducted over the last couple of years.

One of the starting points for me, several years ago, was thinking about this gap between how we evaluate algorithms — often just thinking about if they're accurate, if they're fair — and the actual mechanisms by which algorithms have impact. That is, this process where they're giving advice to a human, and then a human has to actually somehow interpret that information and decide whether and how to use it.

In doing that work, I uncovered a lot of issues in people's ability to identify errors, biases and how people respond to algorithms, and noticed a pretty significant disconnect between the empirical findings and the way that a lot of policies talked about this.

[Policies] are essentially just saying, "Hey, well, there's a human in the loop. So it's fine to use these risk assessments when making sentencing decisions." I wanted to really dig into this and see: What do the policies actually call for? And how do they fall short? Does anything actually work?

Before we walk through your findings in this paper, let's talk a little bit about what you have discovered in your more technical research on algorithms' impact.

The first paper really looked at how introducing risk assessments alters the predictions that people make. The primary finding was that people respond to risk assessments in biased ways. People are more likely to follow a recommendation to increase their estimate of risk when evaluating Black defendants and more likely to decrease their estimate of risk suggested by the risk assessment when evaluating white defendants. So, even if we were to say, "OK, this algorithm might meet certain standards of fairness," the actual impacts of these algorithms might not satisfy those constraints when you think about how humans are going to respond.

The second study was an extension of that, looking at whether people are able to evaluate the quality of algorithmic predictions. We found that they weren't. People can't really do that job, which is central to the idea of people being able to determine which recommendations from an algorithm they should work with or not.

The final piece, which was just published, was shifting from predictions to the decision-making process, and looking at how risk assessments alter the underlying decision-making process that people follow. If they're shown a risk assessment, does that actually make judges more likely to weigh risk more heavily when making decisions? We must balance the desire to reduce risk with other interests around the liberty of defendants, and so on. Are we improving the accuracy of human prediction? Or are we actually making risk a more salient feature of decision-making?

We ran an experiment to test that and found that we're more in the latter camp. We're not simply altering people's predictions of risk. We're altering how people factor risk into their decisions, and essentially prompting them to weigh risk as a more important factor when making decisions.

In the paper about human oversight of algorithms, you walk through three different ways policies are trying to introduce some level of human oversight to the deployment of AI, and you argue each way is flawed. Walk me through those three ways and their flaws.

They're all somewhat overlapping and related. The first approach is to say: If a decision is based on solely automated processing, then we're going to either prohibit it entirely or require certain rights, like the ability to request human review afterward. The most notable example of this would be the European General Data Protection Regulation, which has an article dedicated to solely automated processing.

By drawing this really strict boundary, we're failing to capture a lot of the influences of algorithms that have actually generated the most significant controversy and demonstrated injustice. Most of the decisions that we're most concerned about are not made in a solely automated fashion already. You could have a human play some relatively superficial role in the decision-making process, such that it's no longer solely automated. And if it doesn't count as solely automated decision-making, then you aren't subject to any of those regulations.

The second approach operates in some ways as a corollary to the first. It's saying: It's OK to use algorithms, as long as there's human discretion, and the human gets to make the final decisions. This is what we see, in particular, for a lot of the risk assessment tools used in the U.S.

But when you actually give people discretion to determine how they should use an algorithm, they don't do what you might want them to do with it. A lot of the research looks at how people override algorithms: How do people diverge from algorithmic predictions? And typically, they do that in sub-optimal ways.People are diverging from algorithms in ways that are actually making their predictions less accurate.

If the risk assessment says to detain someone, they'll generally follow that. If it says to release someone, they will override that in favor of detention much more frequently. Police who are supposed to be overseeing facial recognition predictions also do a really bad job of that. So all of the documentation we have about human oversight and human overrides suggests that they either defer to the tool when they shouldn't, or override the tools in typically detrimental ways.

The third category says: People might not understand the algorithm. So we really, really need to be sure that [the oversight] is meaningful. People should be able to understand how the algorithm works in some form that can help them determine when they should follow it or how to interpret it. The emphasis there is on explanations or algorithmic transparency.

The issue here really just builds on the issues of the second group. Yes, you can give people the ability to override the algorithm. But that doesn't necessarily help. Typically, people don't override algorithms in beneficial ways. Unfortunately, even explanations and transparency don't seem to improve things — and can actually make it worse. The explanations can make people trust the algorithm more, even if the algorithm shouldn't be trusted.

What are the alternatives, if human beings are not a sufficient safeguard?

It's not simply, "Oh, we can just turn from human oversight to something else." Human oversight plays a really fundamental role in justifying and legitimizing these tools. So we actually need to, given these failures, start from farther upstream and think about how we're even making decisions about when algorithms should be used at all.

We should be putting much more scrutiny on whether it's actually appropriate to use an algorithm in a given situation. Often, courts and policymakers will justify the use of low-quality algorithms by assuming that human review can account for their flaws, but I think we should be much more critical. And I think in many of these cases, we should be ready to say: This actually just isn't an algorithm that we trust. This isn't a decision where an algorithm is particularly well-suited to enhancing decision-making.

We should put much more of a burden on agencies to justify why it's appropriate to use an algorithm in a given situation. They should have to describe more proactively why this algorithm is going to improve decision-making or why it's appropriate to have an algorithm make this decision. And what is the quality of this algorithm? Is it actually one that we would trust with altering potentially high-stakes decisions? We just need to do much more proactive research of the actual human oversight or human-algorithm collaboration process.

Already, we're seeing policies that are calling for various types of evaluations of algorithms themselves, saying, "before you deploy the system, you have to run a test to show that the algorithm is accurate, and to show that it's fair." And I think that we should have similar types of tests that are required for the actual decision-making process. So if you're going to incorporate a pre-trial risk assessment into judicial decision making, there should be some sort of proactive assessment, not just of the pre-trial risk assessment itself, but also of how people or judges use the algorithm to make decisions.

Right now, we'll do evaluations after the fact. Two years down the line, we'll see that judges have been using this algorithm in all sorts of unexpected ways. And that's because we didn't actually properly do the homework.

A MESSAGE FROM ALIBABA

www.protocol.com

This year, China will become the first country where ecommerce sales will outpace brick-and-mortar transactions. U.S. businesses are using Alibaba's platforms to sell to 900 million digitally savvy consumers in China and untap new opportunities for long-term growth.

LEARN MORE

Entertainment

Niantic’s future hinges on mapping the metaverse

The maker of Pokémon Go is hoping the metaverse will deliver its next big break.

Niantic's new standalone messaging and social app, Campfire, is a way to get players organizing and meeting up in the real world. It launches today for select Pokémon Go players.

Image: Niantic

Pokémon Go sent Niantic to the moon. But now the San Francisco-based augmented reality developer has returned to earth, and it’s been trying to chart its way back to the stars ever since. The company yesterday announced layoffs of about 8% of its workforce (about 85 to 90 people) and canceled four projects, Bloomberg reported, signaling another disappointment for the studio that still generates about $1 billion in revenue per year from Pokémon Go.

Finding its next big hit has been Niantic’s priority for years, and the company has been coming up short. For much of the past year or so, Niantic has turned its attention to the metaverse, with hopes that its location-based mobile games, AR tech and company philosophy around fostering physical connection and outdoor exploration can help it build what it now calls the “real world metaverse.”

Keep Reading Show less
Nick Statt

Nick Statt is Protocol's video game reporter. Prior to joining Protocol, he was news editor at The Verge covering the gaming industry, mobile apps and antitrust out of San Francisco, in addition to managing coverage of Silicon Valley tech giants and startups. He now resides in Rochester, New York, home of the garbage plate and, completely coincidentally, the World Video Game Hall of Fame. He can be reached at nstatt@protocol.com.

Every day, millions of us press the “order” button on our favorite coffee store's mobile application: Our chosen brew will be on the counter when we arrive. It’s a personalized, seamless experience that we have all come to expect. What we don’t know is what’s happening behind the scenes. The mobile application is sourcing data from a database that stores information about each customer and what their favorite coffee drinks are. It is also leveraging event-streaming data in real time to ensure the ingredients for your personal coffee are in supply at your local store.

Applications like this power our daily lives, and if they can’t access massive amounts of data stored in a database as well as stream data “in motion” instantaneously, you — and millions of customers — won’t have these in-the-moment experiences.

Keep Reading Show less
Jennifer Goforth Gregory
Jennifer Goforth Gregory has worked in the B2B technology industry for over 20 years. As a freelance writer she writes for top technology brands, including IBM, HPE, Adobe, AT&T, Verizon, Epson, Oracle, Intel and Square. She specializes in a wide range of technology, such as AI, IoT, cloud, cybersecurity, and CX. Jennifer also wrote a bestselling book The Freelance Content Marketing Writer to help other writers launch a high earning freelance business.
Climate

Supreme Court takes a sledgehammer to greenhouse gas regulations

The court ruled 6-3 that the EPA cannot use the Clean Air Act to regulate power plant greenhouse gas emissions. That leaves a patchwork of policies from states, utilities and, increasingly, tech companies to pick up the slack.

The Supreme Court struck a major blow to the federal government's ability to regulate greenhouse gases.

Eric Lee/Bloomberg via Getty Images

Striking down the right to abortion may be the Supreme Court's highest-profile decision this term. But on Thursday, the court handed down an equally massive verdict on the federal government's ability to regulate greenhouse gas emissions. In the case of West Virginia v. EPA, the court decided that the agency has no ability to regulate greenhouse gas pollution under the Clean Air Act. Weakening the federal government's powers leaves a patchwork of states, utilities and, increasingly, tech companies to pick up the slack in reducing carbon pollution.

Keep Reading Show less
Brian Kahn

Brian ( @blkahn) is Protocol's climate editor. Previously, he was the managing editor and founding senior writer at Earther, Gizmodo's climate site, where he covered everything from the weather to Big Oil's influence on politics. He also reported for Climate Central and the Wall Street Journal. In the even more distant past, he led sleigh rides to visit a herd of 7,000 elk and boat tours on the deepest lake in the U.S.

Fintech

Can crypto regulate itself? The Lummis-Gillibrand bill hopes so.

Creating the equivalent of the stock markets’ FINRA for crypto is the ideal, but experts doubt that it will be easy.

The idea of creating a government-sanctioned private regulatory association has been drawing more attention in the debate over how to rein in a fast-growing industry whose technological quirks have baffled policymakers.

Illustration: Christopher T. Fong/Protocol

Regulating crypto is complicated. That’s why Sens. Cynthia Lummis and Kirsten Gillibrand want to explore the creation of a private sector group to help federal regulators do their job.

The bipartisan bill introduced by Lummis and Gillibrand would require the CFTC and the SEC to work with the crypto industry to look into setting up a self-regulatory organization to “facilitate innovative, efficient and orderly markets for digital assets.”

Keep Reading Show less
Benjamin Pimentel

Benjamin Pimentel ( @benpimentel) covers crypto and fintech from San Francisco. He has reported on many of the biggest tech stories over the past 20 years for the San Francisco Chronicle, Dow Jones MarketWatch and Business Insider, from the dot-com crash, the rise of cloud computing, social networking and AI to the impact of the Great Recession and the COVID crisis on Silicon Valley and beyond. He can be reached at bpimentel@protocol.com or via Google Voice at (925) 307-9342.

Enterprise

Alperovitch: Cybersecurity defenders can’t be on high alert every day

With the continued threat of Russian cyber escalation, cybersecurity and geopolitics expert Dmitri Alperovitch says it’s not ideal for the U.S. to oscillate between moments of high alert and lesser states of cyber readiness.

Dmitri Alperovitch (the co-founder and former CTO of CrowdStrike) speaks at RSA Conference 2022.

Photo: RSA Conference

When it comes to cybersecurity vigilance, Dmitri Alperovitch wants to see more focus on resiliency of IT systems — and less on doing "surges" around particular dates or events.

For instance, whatever Russia is doing at the moment.

Keep Reading Show less
Kyle Alspach

Kyle Alspach ( @KyleAlspach) is a senior reporter at Protocol, focused on cybersecurity. He has covered the tech industry since 2010 for outlets including VentureBeat, CRN and the Boston Globe. He lives in Portland, Oregon, and can be reached at kalspach@protocol.com.

Latest Stories
Bulletins