How Twitter’s ethical AI team works — and what it's trying to teach the industry

Rumman Chowdhury and Jutta Williams on their new bias bounty program, why responsible ML is like race-car brakes and building thoughtfulness into AI.

The Twitter app.

Twitter's META team is working to build ethics into AI, one model at a time.

Photo: Joshua Hoehne/Unsplash

Twitter recently released one of its algorithms into the world — the one that controls how images are cropped in the Twitter app — and said it would pay people to find all the ways it was broken. Rumman Chowdhury and Jutta Williams, two executives on Twitter's META team, called it an "algorithmic bias bounty challenge," and said they hoped it would set a precedent for "proactive and collective identification of algorithmic harms."

The META team's job is to help Twitter (and the rest of the industry) make sure its artificial intelligence and machine-learning products are as ethically and responsibly used as they can be. What does that mean or look like in practice? Well, Twitter (and the rest of the industry) is still figuring that out. And this work, at Google and elsewhere, has led to huge internal turmoil as companies have begun to reckon more honestly with the ramifications of their own work.

Chowdhury and Williams joined the Source Code podcast to talk about how the META team works, what they hope the bias bounty challenge will accomplish and the challenges of doing qualitative research in a quantitative industry. That, and what "Chitty Chitty Bang Bang" can teach us about AI.

You can hear our full conversation on the latest episode of the Source Code podcast, or by clicking on the player above. Below are excerpts from our conversation, edited for length and clarity.

David Pierce: At the risk of starting with a question you could spend the whole hour answering, can you explain, at a very high level, what the problem is you're trying to solve at Twitter? And why it has proven so hard, both for you and for everyone, to figure out how to solve?

Rumman Chowdhury: Oh, that would actually take an hour, or more. Entire dissertations are being written on that topic! So, META stands for machine learning ethics, transparency and accountability. And in a nutshell, that's what we look at. There's an already well-documented history of algorithmic bias, unfairness and unintentional and intentional ethical harms. When I say intentional, there are some adversarial cases where bad things are happening. But most of the time, we are working in the space of unintended consequences.

Why this is such a big undertaking, and a team like META will never go away at any company, is that we are considering deeply ingrained social, ethical biases that have existed for quite some time. Machine learning does not create new biases, it is simply an amplification of problems that already exist. In that sense, it can seem like a very, very daunting task. We are not here to solve all of society's problems, but we will do our best, within the small slice of the universe that we can help and manage, to make sure it doesn't get reflected where we are.

Jutta Williams: Rumman and I come from different requirements and roles. And my role is to really take all this amazing learning, all these new, consequential ways of thinking, and apply them. Make them operational, put them in our products, and make sure that there's action. And so I think that for us here in META, I would say that one of the hardest things to do is to take this very nascent new learning, turn it into action and then make a visible change so that people are having a better experience.

DP: It's this very big, society-sized problem, and everybody is dealing with different versions of the same thing. And I could see a world in which you try to create a new office at the White House to do this kind of work, or you do it academically so that it's easy to share. I'm curious why you picked Twitter, but also why it felt right to do this inside of a company in the first place.

RC: It is an ecosystem of players that helps this kind of work move forward. You know, I think it is a noble goal that we are trying to algorithmically or technologically not re-create the problems and issues of the past. Given that this is such a broad undertaking, it does require an entire slew of people.

I do think that the White House should have an office, or at least a group of people who are working on this problem. I also do think academics should be well-funded to do this research. I also think that civil society should flourish. And you know, these types of organizations should be funded. And I also do think that [the] industry needs to have people!

Anna Kramer: So what are the things at a tech company, specifically, that they're empowered to do — maybe it's specific to Twitter, maybe it's industrywide — that is different from these other actors that you're talking about?

JW: I'd say that it's not either/or. I'm an "and" person. So, I used to chair standards for AI for ISSIP for the U.S. And I write to my representatives. And I am happy to speak with regulators. And I get to work inside of a tech company. So it's not that I'm doing this in opposition to or in lieu of all those other things. I think every citizen who is concerned about their own experiences and how algorithms make decisions about their experiences online should be involved in every way possible that's accessible to them. I happen to have access inside of this one of the tech companies, in addition to being a citizen that's affected.

Internally, we have the ability to talk to developers, to educate and to grow understanding and to see the data and understand how the data is being used very specifically. And that's a perspective you only get if you work inside of a company. And I think you can effect a lot more change when you have the ability to sit down with the people who are making the decisions and want to know how to do this better, faster and with more care.

RC: And specifically, sitting in a company, you do get access to data, to models, to the individuals building these models. You know, as META, we can only accomplish so much. Our team does not own every single model at Twitter. But we work with all those teams. And often, especially in a company like Twitter, we find that in good faith, people are trying to figure out how to solve these problems.

It's worth noting that the field of machine learning has not advanced enough where the average data scientist, the average ML engineer, really understands how to address these problems. We have only arrived at a place where it is a common enough conversation that people now are open enough to say "We should look at these problems," and "Here are the problems." And the next step is giving people the right kinds of tools and access to information and access to experts who can help them fix these problems.

It's still a contentious issue in the machine learning world. I mean, if you look at NeurIPS, they recently instituted an impact statement. It is just a statement. And even that has led to a firestorm of controversy. That is sometimes disheartening, to see that some people don't even want to take a minute to reflect on the work they are doing. No papers are being turned down because of impact statements! All that is being asked is that people give some consideration.

DP: Let's dig into that, actually. One of the strange things about listening to you explain this at the top was that it doesn't seem like the basic idea of what you're trying to do is terribly controversial. How do we make sure that machines don't make human mistakes? I can't imagine you could present that to anyone, and they would throw that back in your face. And yet, this has been a really controversial thing. What is it about this that comes across so controversial to people?

RC: You know, I don't know. I agree with you. I think we can all agree that really what we're trying to do is help companies be more thoughtful about what they're building, and how they build it, and how it impacts the world. There are plenty of other services and things that companies do, like data protection and security, that actually have a really similar remit.

I think we can all agree that really what we're trying to do is help companies be more thoughtful about what they're building, and how they build it, and how it impacts the world.

JW: There's always this tension of first-to-market. And so speed is always something that we compete with. And there's this misperception that if you add thoughtfulness and you add control that you'll be slow. I think that that's wildly incorrect. I think that when people don't have to guess, and when they're not worried, they actually go a lot faster.

I used to say, why are there brakes on a car? So that you can drive fast! When you look at the evolution of braking systems, the fastest cars have the most sophisticated braking systems. And it's so that you can take corners quickly. Especially if you're a rally car driver, and you're driving on an unknown course — which is what's happening with a lot of innovation — then you don't have to worry about driving off the rails and hurting yourself or others. So I don't think that there's tension when it's done well. Bolt-on practices and reaction isn't always done well.

AK: I think this is a good moment to bring up the bug bounty program. One of the ways I'm interpreting what you're doing there is kind of addressing head-on a lot of the people who are skeptical, because you're creating a more public forum for people to talk about this, and to understand what it is that you're doing. What was the thinking behind launching this bug bounty? Where does it fit into this broader goal of changing the bigger tech community conversation around the work that you're doing?

RC: Absolutely. This is like my favorite thing to talk about at the moment. I'm very excited.

So, our algorithmic bias bounty is modeled after your traditional InfoSec vulnerability bounties and bug bounties. What we're doing is opening up a model, we've provided a rubric, and we're very clear about how submissions are going to be graded. Folks have a week to identify all the harms they can, essentially, and share their findings with us. What we are asking people to share is their code, a brief self-grading rubric, as well as a brief description of why they took the approach they did.

We have very intentionally made this program global. We really do want global perspectives. One of the critiques of tech in general, but also even the responsible ML community, is we generally have a particular type of person, we tend to live in a particular place — i.e., the Bay Area — a lot of us work in tech or tech-adjacent. And also it's a very Western-focused field. So to hear people starting to ask questions about caste-based discrimination, for example, or how might an image-cropping algorithm mis-crop somebody who's wearing a head covering, these are usually not questions that come up in a very Western setting.

So our bias bounty is open until the end of the week. We have cash prizes for folks, not just for the people who score the highest, but also for the most innovative approach and the most generalizable approach.

JW: ML is often considered a thing, but it's really 25 different things. And figuring out how to apply a control or how to do something better in every one of those parts of developing and delivering an ML algorithm, it takes specialists. And when bug bounties were first introduced to security, it was enormous. I remember an operating system launch as part of when I worked for the government, and there were over 100,000 bugs that were active in that operating system when it went live. And the company that shipped that product shut down product development for a period of time just to close bugs.

ML is often considered a thing, but it's really 25 different things.

It's so big, and it's so complex, that it's very hard for any one entity to solve all the problems. We have that problem with AI in general. And it's not just a matter of perspectives, it's also the complexity of the systems. So asking for help should be rewarded. It was incredibly beneficial to the security world when we opened it up to not just adversarial thinking, but even cooperative thinking, gave people the method by which to communicate with us effectively, and then we could reward them for that work. I think that our world today is closer aligned to the security world than people appreciate or realize, and I don't see any challenge with this being just as beneficial to the ML space as security companies were to security engineering work.

AK: How do you create user demand for this? If it's something you're going to sell as an asset, your users need to be wanting it or requesting it and knowing what they're talking about. How much of your work is around that part of the question? And then how do you go about doing that?

JW: It's such a big part of the product management role and responsibility. We're supposed to be the advocate and the ombudsman, if you will, for consumers and for people. And I don't think that users of the platform are the only people who are affected by our products. So when I say people who are impacted, it could be society at large.

So we leverage consumer experience researchers, people who do qualitative investigation. They talk to people, not just people who use our product, but also people affected by our product, and the conversation that's enabled by our product and platforms.

We have a project ongoing right now around algorithmic choice. And there's a lot of rhetoric and conversation in the industry about giving people more choices, about how algorithms make decisions that affect their experiences on platforms like ours. But we don't really know what choice means to every person. And we don't know what algorithmic choice specifically means to people.

DP: I would imagine one of the challenges of doing this kind of work in a tech setting is that it just wants to be so qualitative, but eventually you have to find ways to try to make this stuff as quantitative as possible in order to actually start to build it into products. Is that as hard as I think it would be?

JW: It's extremely hard, especially in something as esoteric as this space. What I learned about privacy is that every person thinks of that word differently. So with algorithms, you're building — I won't say a one-size-fits-all, it's a personalization algorithm — but it's built off of one construct. And so when you see somebody implement a setting or a button that gives somebody choice, but it's a choice that is applied to everybody in exactly the same quantitative way, it's not necessarily the choice that I'm talking about when I say user choice.

Most of the time, these are settings that filter something bad out, right? As opposed to adding something delightful to your experience. And I don't know that more flags and filters is really adding choice or enabling something better for people. So it's figuring out what is the right thing to do. But then turning it into code, turning into a technical design that implements that on a very personal basis and allows people from different walks of life and still provides a safe experience? It is very complicated.

We've talked about, say, a profanity filter. You can turn on or off profanity, but what does profanity mean to you? And how do you qualify something as profane? What if you speak a different language? What if you don't consider one word to be profane? What if that's a very common word in the way that you speak English? These are all things we have to consider before we start making decisions about how algorithms apply and affect your world.

DP: Is it possible to draw that kind of baseline you're talking about in one way that works for, if not everybody, then almost everybody?

RC: I think we can reach a way of approaching it that could feasibly be generalizable. I do not think there is one rote methodology to follow. And that's the struggle.

The first question you get asked from any engineer is, "What's the checklist?" And to be fair, that's how a lot of engineering folks work. It's like, I have this checklist of things to follow. I do these steps. And it's really hard for folks to internalize that, you know, sometimes you don't pass, because the thing you might fundamentally be building is unethical or wrong or just will be an absolute disaster. That is one thing to internalize. And the second is that no, we actually do require people to be thoughtful. And then if they don't know how to answer it, raise their hand and ask the right people who can help them. That is quite difficult.

I've learned a lot working with folks in risk and compliance. It has reshaped how I understand algorithmic bias and the harms, by thinking about how people who look at risk — especially things that are less tangible, like reputational harm — think of that when doing risk calculations. A lot of that world sits in very legally wrapped language, and qualitative language that does not translate well to machine learning folks who want very clear, standard ways of saying things and doing things.

A lot of folks start with a list of questions that model owners should be asked. What I've found is that model owners want to give very precise answers. When you ask them open-ended questions, they spiral. And it's really difficult to answer! So what we have done in our assessment rubric is to state things as a statement, and ask them to assess the likelihood of this event happening, and the impact if it were to happen.

Rather than saying, "Is there bias in your model?" we make it a statement: "There is harmful bias in this model." That actually gives model owners a better place to start from. That completely reframed how we built our internal assessment tool.

AK: It also seems to me like part of your job that we're not talking about much is, when is it your job to just say, "No, this shouldn't be automated at all?" Or "No, artificial intelligence isn't useful for everything." Or is it just inevitable that eventually everything has some kind of model informing it, and fighting it is a futile effort?

RC: Our image-cropping assessment was a perfect example of us coming to a conclusion that a model wasn't the best way to do something. What we could have done is looked at our model in a bubble and said, here's where we see biases, and we're gonna go make them nice and then everything will work. But to take a step back, what we felt once we introduced the concept of representational harm — which by the way, our bias bounty is specifically focusing on presentational harm — we realized that the best way to enable pure representation is just to not introduce an algorithmic layer and allow people to share their photos as they are.

I was alluding earlier to our internal risk assessment, and one of the questions that I've added there actually asked the model owner, "Is this model better than what would exist otherwise?" And that is often not a question that is asked of model owners, because as long as someone can show that it's faster, or it's cool, then they're often not questioned. But we're specifically asking, what has the world been without this model that you've built? And is it actually adding a net improvement to someone's experience?

What has the world been without this model that you've built? And is it actually adding a net improvement to someone's experience?

JW: Andrew Ng very famously said that ML or AI is like electricity: It's going to be everywhere and do everything. But we don't use electricity for everything, even though it's pervasively available, right? We still make a soufflé without using an electric meter. And so my point is simply that unless you need to do something fast, unless you need to do something at scale and unless it's the right tool for the job, just because ML is available doesn't always make it the right tool.

One of my favorite movies is "Chitty Chitty Bang Bang." I don't know if it's necessary to create a big technical machine in order to fry an egg. So sometimes it's just a question of, is this the appropriate use of something that is supposed to be working at 25 horsepower, or do we actually just need to take a stroll?

Power

How the creators of Spligate built gaming’s newest unicorn

1047 Games is now valued at $1.5 billion after three rounds of funding since May.

1047 Games' Splitgate amassed 13 million downloads when its beta launched in July.

Image: 1047 Games

The creators of Splitgate had a problem. Their new free-to-play video game, a take on the legendary arena shooter Halo with a teleportation twist borrowed from Valve's Portal, was gaining steam during its open beta period in July. But it was happening too quickly.

Splitgate was growing so fast and unexpectedly that the entire game was starting to break, as the servers supporting the game began to, figuratively speaking, melt down. The game went from fewer than 1,000 people playing it at any given moment in time to suddenly having tens of thousands of concurrent players. Then it grew to hundreds of thousands of players, all trying to log in and play at once across PlayStation, Xbox and PC.

Keep Reading Show less
Nick Statt
Nick Statt is Protocol's video game reporter. Prior to joining Protocol, he was news editor at The Verge covering the gaming industry, mobile apps and antitrust out of San Francisco, in addition to managing coverage of Silicon Valley tech giants and startups. He now resides in Rochester, New York, home of the garbage plate and, completely coincidentally, the World Video Game Hall of Fame. He can be reached at nstatt@protocol.com.

While it's easy to get lost in the operational and technical side of a transaction, it's important to remember the third component of a payment. That is, the human behind the screen.

Over the last two years, many retailers have seen the benefit of investing in new, flexible payments. Ones that reflect the changing lifestyles of younger spenders, who are increasingly holding onto their cash — despite reports to the contrary. This means it's more important than ever for merchants to take note of the latest payment innovations so they can tap into the savings of the COVID-19 generation.

Keep Reading Show less
Antoine Nougue,Checkout.com

Antoine Nougue is Head of Europe at Checkout.com. He works with ambitious enterprise businesses to help them scale and grow their operations through payment processing services. He is responsible for leading the European sales, customer success, engineering & implementation teams and is based out of London, U.K.

Protocol | Policy

Why Twitch’s 'hate raid' lawsuit isn’t just about Twitch

When is it OK for tech companies to unmask their anonymous users? And when should a violation of terms of service get someone sued?

The case Twitch is bringing against two hate raiders is hardly black and white.

Photo: Caspar Camille Rubin/Unsplash

It isn't hard to figure out who the bad guys are in Twitch's latest lawsuit against two of its users. On one side are two anonymous "hate raiders" who have been allegedly bombarding the gaming platform with abhorrent attacks on Black and LGBTQ+ users, using armies of bots to do it. On the other side is Twitch, a company that, for all the lumps it's taken for ignoring harassment on its platform, is finally standing up to protect its users against persistent violators whom it's been unable to stop any other way.

But the case Twitch is bringing against these hate raiders is hardly black and white. For starters, the plaintiff here isn't an aggrieved user suing another user for defamation on the platform. The plaintiff is the platform itself. Complicating matters more is the fact that, according to a spokesperson, at least part of Twitch's goal in the case is to "shed light on the identity of the individuals behind these attacks," raising complicated questions about when tech companies should be able to use the courts to unmask their own anonymous users and, just as critically, when they should be able to actually sue them for violating their speech policies.

Keep Reading Show less
Issie Lapowsky

Issie Lapowsky ( @issielapowsky) is Protocol's chief correspondent, covering the intersection of technology, politics, and national affairs. She also oversees Protocol's fellowship program. Previously, she was a senior writer at Wired, where she covered the 2016 election and the Facebook beat in its aftermath. Prior to that, Issie worked as a staff writer for Inc. magazine, writing about small business and entrepreneurship. She has also worked as an on-air contributor for CBS News and taught a graduate-level course at New York University's Center for Publishing on how tech giants have affected publishing.

Protocol | Workplace

Remote work is here to stay. Here are the cybersecurity risks.

Phishing and ransomware are on the rise. Is your remote workforce prepared?

Before your company institutes work-from-home-forever plans, you need to ensure that your workforce is prepared to face the cybersecurity implications of long-term remote work.

Photo: Stefan Wermuth/Bloomberg via Getty Images

The delta variant continues to dash or delay return-to-work plans, but before your company institutes work-from-home-forever plans, you need to ensure that your workforce is prepared to face the cybersecurity implications of long-term remote work.

So far in 2021, CrowdStrike has already observed over 1,400 "big game hunting" ransomware incidents and $180 million in ransom demands averaging over $5 million each. That's due in part to the "expanded attack surface that work-from-home creates," according to CTO Michael Sentonas.

Keep Reading Show less
Michelle Ma
Michelle Ma (@himichellema) is a reporter at Protocol, where she writes about management, leadership and workplace issues in tech. Previously, she was a news editor of live journalism and special coverage for The Wall Street Journal. Prior to that, she worked as a staff writer at Wirecutter. She can be reached at mma@protocol.com.
Protocol | Fintech

When COVID rocked the insurance market, this startup saw opportunity

Ethos has outraised and outmarketed the competition in selling life insurance directly online — but there's still an $887 billion industry to transform.

Life insurance has been slow to change.

Image: courtneyk/Getty Images

Peter Colis cited a striking statistic that he said led him to launch a life insurance startup: One in twenty children will lose a parent before they turn 15.

"No one ever thinks that will happen to them, but that's the statistics," the co-CEO and co-founder of Ethos told Protocol. "If it's a breadwinning parent, the majority of those families will go bankrupt immediately, within three months. Life insurance elegantly solves this problem."

Keep Reading Show less
Benjamin Pimentel

Benjamin Pimentel ( @benpimentel) covers fintech from San Francisco. He has reported on many of the biggest tech stories over the past 20 years for the San Francisco Chronicle, Dow Jones MarketWatch and Business Insider, from the dot-com crash, the rise of cloud computing, social networking and AI to the impact of the Great Recession and the COVID crisis on Silicon Valley and beyond. He can be reached at bpimentel@protocol.com or via Signal at (510)731-8429.

Latest Stories