Why AI fairness tools might actually cause more problems

Important nuances were lost in translation when a rule commonly used to measure disparate impacts on protected groups in hiring was codified for easy-to-use tools promising AI fairness and bias removal.

Female team leader standing in board room, providing feedback on business strategy to multi racial colleagues, forecasting and projecting

Important nuances were lost in translation.

Photo: 10'000 Hours/DigitalVision

Salesforce uses it. So do H20.ai and other AI tool makers. But instead of detecting the discriminatory impact of AI used for employment and recruitment, the “80% rule” — also known as the 4/5 rule — could be introducing new problems.

In fact, AI ethics researchers say harms that disparately affect some groups could be exacerbated as the rule is baked into tools used by machine-learning developers hoping to reduce discriminatory effects of the models they build.

“The field has amplified the potential for harm in codifying the 4/5 rule into popular AI fairness software toolkits,” wrote researchers Jiahao Chen, Michael McKenna and Elizabeth Anne Watkins in an academic paper published earlier this year. “The harmful erasure of legal nuances is a wake-up call for computer scientists to self-critically re-evaluate the abstractions they create and use, particularly in the interdisciplinary field of AI ethics.”

The rule has been used by federal agencies, including the Departments of Justice and Labor, the Equal Employment Opportunity Commission and others, as a way to compare the hiring rate of protected groups and white people and determine whether hiring practices have led to discriminatory impacts.

The goal of the rule is to encourage companies to hire protected groups at a rate that is at least 80% that of white men. For example, if the hired rate for white men is 60% but only 45% for Black people, the ratio of the two hiring rates would be 45:60 — or 75% — which does not meet the rule’s 80% threshold. Federal guidance on using the rule for employment purposes has been updated over the years to incorporate other factors.

The use of the rule in fairness tools emerged when computer engineers sought a way to abstract the technique used by social scientists as a foundational approach to measuring disparate impact into numbers and code, said Watkins, a social scientist and postdoctoral research associate at Princeton University’s Center for Information Technology Policy and the Human-Computer Interaction Group.

“In computer science, there’s a way to abstract everything. Everything can be boiled down to numbers,” Watkins told Protocol. But important nuances got lost in translation when the rule was digitized and codified for easy bias-removal tools.

When applied in real-life scenarios, the rule is typically applied as a first step in a longer process intended to understand why disparate impact has occurred and how to fix it. However, oftentimes engineers use fairness tools at the end of a development process, as a last box to check before a product or machine-learning model is shipped.

“It’s actually become the reverse, where it’s at the end of a process,” said Watkins, who studies how computer scientists and engineers do their AI work. “It’s being completely inverted from what it was actually supposed to do … The human element of the decision-making gets lost.”

The simplistic application of the rule also misses other important factors weighed in traditional assessments. For instance, researchers usually want to inspect which subsections of applicant groups should be measured using the rule.

To have 19% disparate impact and say that’s legally safe when you can confidently measure disparate impact at 1% or 2% is deeply unethical,

Other researchers also have inspected AI ethics toolkits to examine how they relate to actual ethics work.

The rule used on its own is a blunt instrument and not sophisticated enough to meet today’s standards, said Danny Shayman, AI and machine-learning product manager at InRule, a company that sells automated intelligence software to employment, insurance and financial services customers.

“To have 19% disparate impact and say that’s legally safe when you can confidently measure disparate impact at 1% or 2% is deeply unethical,” said Shayman, who added that AI-based systems can confidently measure impact in a far more nuanced way.

Model drifting into another lane

But the rule is making its way into tools AI developers use in the hopes of removing disparate impacts against vulnerable groups and detecting bias.

“The 80% threshold is the widely used standard for detecting disparate impact,” notes Salesforce in its description of its bias detection methodology, which incorporates the rule to flag data for possible bias problems. “Einstein Discovery raises this data alert when, for a sensitive variable, the selection data for one group is less than 80% of the group with the highest selection rate.”

H20.ai also refers to the rule in documentation about how disparate impact analysis and mitigation works in its software.

Neither Salesforce nor H20.ai responded to requests to comment for this story.

The researchers also argued that translating a rule used in federal employment law into AI fairness tools could divert it into terrain outside the normal context of hiring decisions, such as banking and housing. They said this amounts to epistemic trespassing, or the practice of making judgements in arenas outside an area of expertise.

“In reality, no evidence exists for its adoption into other domains,” they wrote regarding the rule. “In contrast, many toolkits [encourage] this epistemic trespassing, creating a self-fulfilling prophecy of relevance spillover, not just into other U.S. regulatory contexts, but even into non-U.S. jurisdictions!”

Watkins’ research collaborators work for Parity, an algorithmic audit company that may benefit from deterring use of off-the-shelf fairness tools. Chen, chief technology officer of Parity and McKenna, the company’s data science director, are currently involved in a legal dispute with Parity’s CEO.

Although application of the rule in AI fairness tools can create unintended problems, Watkins said she did not want to demonize computer engineers for using it.

“The reason this metric is being implemented is developers want to do better,” she said. “They are not incentivized in [software] development cycles to do that slow, deeper work. They need to collaborate with people trained to abstract and trained to understand those spaces that are being abstracted.”


Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep ReadingShow less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep ReadingShow less
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep ReadingShow less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep ReadingShow less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.


Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep ReadingShow less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories