Salesforce uses it. So do H20.ai and other AI tool makers. But instead of detecting the discriminatory impact of AI used for employment and recruitment, the “80% rule” — also known as the 4/5 rule — could be introducing new problems.
In fact, AI ethics researchers say harms that disparately affect some groups could be exacerbated as the rule is baked into tools used by machine-learning developers hoping to reduce discriminatory effects of the models they build.
“The field has amplified the potential for harm in codifying the 4/5 rule into popular AI fairness software toolkits,” wrote researchers Jiahao Chen, Michael McKenna and Elizabeth Anne Watkins in an academic paper published earlier this year. “The harmful erasure of legal nuances is a wake-up call for computer scientists to self-critically re-evaluate the abstractions they create and use, particularly in the interdisciplinary field of AI ethics.”
The rule has been used by federal agencies, including the Departments of Justice and Labor, the Equal Employment Opportunity Commission and others, as a way to compare the hiring rate of protected groups and white people and determine whether hiring practices have led to discriminatory impacts.
The goal of the rule is to encourage companies to hire protected groups at a rate that is at least 80% that of white men. For example, if the hired rate for white men is 60% but only 45% for Black people, the ratio of the two hiring rates would be 45:60 — or 75% — which does not meet the rule’s 80% threshold. Federal guidance on using the rule for employment purposes has been updated over the years to incorporate other factors.
The use of the rule in fairness tools emerged when computer engineers sought a way to abstract the technique used by social scientists as a foundational approach to measuring disparate impact into numbers and code, said Watkins, a social scientist and postdoctoral research associate at Princeton University’s Center for Information Technology Policy and the Human-Computer Interaction Group.
“In computer science, there’s a way to abstract everything. Everything can be boiled down to numbers,” Watkins told Protocol. But important nuances got lost in translation when the rule was digitized and codified for easy bias-removal tools.
When applied in real-life scenarios, the rule is typically applied as a first step in a longer process intended to understand why disparate impact has occurred and how to fix it. However, oftentimes engineers use fairness tools at the end of a development process, as a last box to check before a product or machine-learning model is shipped.
“It’s actually become the reverse, where it’s at the end of a process,” said Watkins, who studies how computer scientists and engineers do their AI work. “It’s being completely inverted from what it was actually supposed to do … The human element of the decision-making gets lost.”
The simplistic application of the rule also misses other important factors weighed in traditional assessments. For instance, researchers usually want to inspect which subsections of applicant groups should be measured using the rule.
To have 19% disparate impact and say that’s legally safe when you can confidently measure disparate impact at 1% or 2% is deeply unethical,
Other researchers also have inspected AI ethics toolkits to examine how they relate to actual ethics work.
The rule used on its own is a blunt instrument and not sophisticated enough to meet today’s standards, said Danny Shayman, AI and machine-learning product manager at InRule, a company that sells automated intelligence software to employment, insurance and financial services customers.
“To have 19% disparate impact and say that’s legally safe when you can confidently measure disparate impact at 1% or 2% is deeply unethical,” said Shayman, who added that AI-based systems can confidently measure impact in a far more nuanced way.
Model drifting into another lane
But the rule is making its way into tools AI developers use in the hopes of removing disparate impacts against vulnerable groups and detecting bias.
“The 80% threshold is the widely used standard for detecting disparate impact,” notes Salesforce in its description of its bias detection methodology, which incorporates the rule to flag data for possible bias problems. “Einstein Discovery raises this data alert when, for a sensitive variable, the selection data for one group is less than 80% of the group with the highest selection rate.”
H20.ai also refers to the rule in documentation about how disparate impact analysis and mitigation works in its software.
Neither Salesforce nor H20.ai responded to requests to comment for this story.
The researchers also argued that translating a rule used in federal employment law into AI fairness tools could divert it into terrain outside the normal context of hiring decisions, such as banking and housing. They said this amounts to epistemic trespassing, or the practice of making judgements in arenas outside an area of expertise.
“In reality, no evidence exists for its adoption into other domains,” they wrote regarding the rule. “In contrast, many toolkits [encourage] this epistemic trespassing, creating a self-fulfilling prophecy of relevance spillover, not just into other U.S. regulatory contexts, but even into non-U.S. jurisdictions!”
Watkins’ research collaborators work for Parity, an algorithmic audit company that may benefit from deterring use of off-the-shelf fairness tools. Chen, chief technology officer of Parity and McKenna, the company’s data science director, are currently involved in a legal dispute with Parity’s CEO.
Although application of the rule in AI fairness tools can create unintended problems, Watkins said she did not want to demonize computer engineers for using it.
“The reason this metric is being implemented is developers want to do better,” she said. “They are not incentivized in [software] development cycles to do that slow, deeper work. They need to collaborate with people trained to abstract and trained to understand those spaces that are being abstracted.”