Computer vision forms the foundation for AI-based technology products, offering immense potential for helping spot disease symptoms or ensuring that autonomous vehicles accurately recognize objects on the road. But the same techniques also form the underpinnings of tech with immense potential for personal harm and societal damage — from discriminatory facial recognition-fueled surveillance and disinformation-spreading deepfakes to controversial tech used to detect people’s emotional states.
The possible negative impacts of AI that computer vision researchers help bring to life are getting more attention, prompting AI businesses to emphasize the importance of ethical considerations guiding how their products are built. Yet over the last several years, the computer vision community has been reluctant to recognize connections between the research advancements and cool math problem-solving achievements celebrated at one of its most prestigious annual conferences, and the possible uses for that tech once it is baked into apps and software products.
This year, that began to change, albeit slowly.
For the first time, the Computer Vision and Pattern Recognition Conference — a global event that attracted companies including Amazon, Google, Microsoft and Tesla to recruit new AI talent this year — “strongly encouraged” researchers whose papers were accepted to the conference to include a discussion about potential negative societal impacts of their research in their submission forms.
“Because of the much more real impact that computer vision is playing in people's lives, we instituted this process for the authors to discuss both the limitations of their papers [and] also potential social limits,” said Dimitris Samaras, a program chair of this year’s CVPR conference, and a professor and director of the Computer Vision Lab at Stony Brook University.
“It’s mostly so that people – authors – are forced to think and frame their work in a way that impacts are recognized as early as possible, and if necessary, [mitigated],” Samaras told Protocol.
Not my job
The policy shift ruffled some feathers. Academics are “super aware” of the potential impact of their research on the real world, said one conference attendee who asked not to be named. However, he said, because researchers cherish their academic freedom, asking them to predict the future applications for research that could be in very early stages and years away from viability in products restricts that independence.
“They are not good at telling you what the applications of their research are. It’s not their job,” he said.
“That is exactly what pisses me off,” said Timnit Gebru, founder and executive director of the Distributed Artificial Intelligence Research Institute, and a researcher with a Ph.D. in computer vision. “[Computer vision researchers] have convinced themselves that it’s not their job.”
While presenting a workshop on fairness, accountability, transparency and ethics in computer vision at CVPR in 2020, Gebru said she experienced what she considered a general disregard for ethical considerations and the human rights impacts of computer vision-based technologies used for border surveillance, autonomous and drone warfare and law enforcement.
Gebru told Protocol she is now “done” with CVPR and has soured on the computer vision field because of the “inability for them to be introspective.”
“We personally believe it is the researcher’s job,” Samaras said, regarding consideration of computer vision’s ethical implications.
This isn’t just a research problem though. Some AI practitioners say that the ethics disconnect continues past the research phase as people like the young computer scientists vying for tech jobs at CVPR make their way into the ranks of corporate AI. There, dismissive attitudes toward ethical considerations can hinder business goals to operationalize ethics principles promised in splashy mission statements and press releases.
“They are not good at telling you what the applications of their research are. It’s not their job.”
“I think that was one of my frustration points in my tech career,” said Navrina Singh, a computer engineer and founder and CEO of Credo AI, which sells software for keeping track of data governance and audit reviews in the machine learning development process.
“As technologists, we were incentivized to build the highest-performing systems and put them out on the market quickly to get business outcomes,” said Singh. “And anytime we would talk about compliance and governance, the technologists were like, ‘Oh, this is not my problem. That's not my space. That's not my incentive structure.’”
Avoiding radical change
CVPR attendance has doubled since five years ago; this year’s show attracted around 10,000 attendees, over half of whom participated in person, according to conference organizers.
The 2022 CVPR conference was held at the convention center in New Orleans, where a growing number of surveillance cameras installed throughout the city are plugged into a real-time law enforcement crime center. The city is currently considering lifting a ban on facial recognition and other surveillance tech established just two years ago.
In its new ethics guidelines, CVPR organizers listed some examples of negative impacts of computer vision. “Could it be used to collect or analyze bulk surveillance data to predict immigration status or other protected categories, or be used in any kind of criminal profiling?” they asked. “Could it be used to impersonate public figures to influence political processes, or as a tool of hate speech or abuse?”
Left: Computer vision researchers attend a workshop held at CVPR in New Orleans. Right: Amazon Science recruited interns at the CVPR conference, where the company held workshops on its Amazon Go computer vision tech.Photos: Kate Kaye/Protocol
Some researchers who presented their work at the conference acknowledged the possible downsides. In a paper about high-resolution face-swapping via latent semantics, researchers wrote, “Although not the purpose of this work, realistic face swapping can potentially be misused for deepfakes-related applications.” To limit the deepfake potential of their research, the authors proposed restricting how the model is released for use and developing deepfake-detection techniques.
However, because CVPR merely encouraged researchers to include an impact assessment in their papers, and did not require them to include that information in their published papers available for viewing outside the conference review process, many make no mention of the ethical implications of their work. For example, another publicly available research paper accepted at this year’s conference, detailing region-aware face-swapping — which can be used to enable deepfakes — does not include any social impact statements.
In fact, researchers were only asked to tell reviewers whether or not their work might have a social impact. “You could say that it's a pure math paper [so] there isn't social impact. If reviewers agree with you, there's nothing to say,” Samaras said.
Some researchers bristle at the increased concern around ethics, in part because they are producing incremental work that could have many future applications, just like any tool might.
“It’s not the techniques that are bad; it’s the way you use it. Fire could be bad or good depending on what you are doing with it,” said François Brémond, a cognitive and computer vision researcher and research director at Inria, the French national research institute for digital science and technology, in an interview at the CVPR conference.
Brémond suggested there is too much focus on potentially negative uses of some computer vision research, particularly when it is designed to help people. His current work involves the use of computer vision to detect key points on faces to gauge subtle changes in expressions of autistic individuals or people with Alzheimer’s. The early-stage research could help decipher signs of internal changes or symptoms and help health care workers better understand their patients, he said.
Controversy over facial expression detection and analysis software led Microsoft to pull it from general use, but retain it in an app used to help people with vision impairment.
“It’s not the techniques that are bad; it’s the way you use it. Fire could be bad or good depending on what you are doing with it.”
Brémond said he saw no reason to include a social impact section in a paper he presented at CVPR because it addressed generalized video action-detection research rather than something directly related to a specific use. The research had no “direct, obvious link to a negative social impact,” Brémond wrote in an email last week. He explained that he is already required to provide information to Inria’s administration regarding the ethical issues associated with his research.
It’s no wonder CVPR program chairs — including Samaras and Stefan Roth, a computer science professor in the Visual Inference Lab at Germany’s Technical University of Darmstadt — aren’t pushing too hard.
“Our decision to make that gradual was a conscious decision,” said Roth. “The community as a whole is not at this point yet. If we make a very radical change, then the reviewers will not really know how to basically take that into account in the review process,” he said, referencing those who review papers submitted to the conference.
“We were trying to break a little bit of ground in that direction. And it's certainly not going to be the last version of that for CVPR,” Roth said.
Changing hearts and minds may come, but slowly, said Olga Russakovsky, an assistant professor in Princeton University’s department of computer science, during an interview at the conference where she gave a presentation on fairness in visual recognition.
“Most folks here are trained as computer scientists, and computer science training does not have an ethics component,” she said. “It evokes this visceral reaction of, ‘Oh, I don't know ethics. And I don't know what that means.’”
A tale of two conferences
The vast majority of tutorials, workshops and research papers presented at CVPR made little or no mention of ethical considerations. Instead, trending subjects included neural rendering and the use of multimodal data to train large machine learning models, or data that comes in a variety of modes such as text, images and videos.
One particularly hot topic this year: a neural network from OpenAI that learns visual concepts from natural language supervision called CLIP, or contrastive language-image pre-training.
“It's getting much more on the radar of a lot of people,” said Samaras, noting that he counted 20 papers presented at CVPR that incorporated CLIP.
CLIP happened to be a topic of conversation at another AI conference, in Seoul, during the same week in late June when CVPR was held. But in this case, CLIP was not celebrated.
“CLIP is an English-language model trained on internet content gathered based on data from an American website (Wikipedia), and our results indicate that CLIP reflects the biases of the language and society which produced the data on which it was trained,” researchers wrote in a paper they presented at FAccT. The growing global conference is dedicated to research focused on fairness, accountability and transparency in sociotechnical systems such as AI.
While FAccT surely reached its endemic audience of AI ethics researchers, more than 2,000 people from the computer vision community who may have learned from that ethics-focused conference — including 460 from South Korea — were thousands of miles away in New Orleans at CVPR, advancing their craft with relatively minimal concern for the societal implications of their work. If anything, the physical separation of the simultaneous events symbolized the disconnect between the computer scientists pushing computer vision ahead and the researchers hoping to infuse it with ethical considerations.
But FAccT organizers hope to spread their message beyond the ethics choir, said Alice Xiang, a general co-chair of this year’s FAccT conference and head of Sony Group’s AI ethics office. “One of the goals we had as organizers of that is to try to make it as much of a big tent as possible. And that is something that we do sometimes worry about: whether practitioners who actually develop AI technologies might feel that this is just a conversation for ethicists.”
But cross-pollination could be a long time coming, Xiang said.
“We're still at a point in AI ethics where it's very hard for us to properly assess and mitigate ethics issues without the partnership of folks who are intimately involved in developing this technology,” she said. “We still have a lot of work to do in that intersection in terms of bringing folks along and making them realize some of these issues.”