Microsoft said last month it would no longer provide general use of an AI-based cloud software feature used to infer people’s emotions. However, despite its own admission that emotion recognition technology creates “risks,” it turns out the company will retain its emotion recognition capability in an app used by people with vision loss.
In fact, amid growing concerns over development and use of controversial emotion recognition in everyday software, both Microsoft and Google continue to incorporate the AI-based features in their products.
“The Seeing AI person channel enables you to recognize people and to get a description of them, including an estimate of their age and also their emotion,” said Saqib Shaikh, a software engineering manager and project lead for Seeing AI at Microsoft who helped build the app, in a tutorial about the product in a 2017 Microsoft video.
After he snapped a photo of his friend using Seeing AI, the app’s automated voice announced that he is a “36-year-old male wearing glasses, looking happy.” Shaikh added, “That’s really cool because at that moment in time you can find out what someone’s facial expression was.”
Microsoft said on June 21 that it will “retire” its facial analysis capabilities that attempt to detect people’s emotional states, gender, age and other attributes. The company pointed to privacy concerns, “the lack of consensus on a definition of ‘emotions’” and the “inability to generalize the linkage between facial expression and emotional state across use cases, regions, and demographics.”
But accessibility goals overrode those problems when it came to Seeing AI. “We worked alongside people from the blind and low vision community who provided key feedback that the emotion recognition feature is important to them, in order to close the equity gap between them and [the] experience of sighted individuals,” said Microsoft in a statement sent to Protocol. The company declined a request for an interview.
“I really do appreciate Microsoft’s nuance here,” said Margaret Mitchell, chief ethics scientist and researcher at Hugging Face and a Ph.D. in computer science who helped develop Seeing AI in 2014 while working at Microsoft. She left the company in 2016. “When you talk to people who are blind you will see there is absolutely an appreciation for description of visual scenes,” Mitchell said.
Saqib Shaikh, a Microsoft software engineering manager and project lead for Seeing AI (left) with Microsoft CEO Satya Nadella
Photo: Justin Sullivan/Getty Images
Emotion recognition is a well-established field of computer vision research; however, AI-based technologies used in an attempt to assess people’s emotional states have moved beyond the research phase. They have been integrated into everyday tech products like virtual meeting platforms, online classroom platforms and software in vehicles used to detect driver distraction or road rage.
Off-the-shelf emotion detection from Google
Google has also grappled with decisions about incorporating computer vision-based AI that attempts to gauge the likelihood that a person is expressing certain emotions or facial characteristics.
The company’s Cloud Vision API includes “pre-trained Vision API models to detect emotion, understand text, and more,” according to a company description. The system rates the likelihood that a face in an image is expressing anger, joy, sorrow and surprise on a scale from “unknown” or “very unlikely” to “very likely.”
Google also includes a feature in its ML Kit tool for mobile apps that detects facial “landmarks” and classifies facial characteristics to show whether or not someone has their eyes open or is smiling.
Even though its own documentation sometimes claims its software “detects emotion,” a Google spokesperson played down that idea, noting that its Vision API does not detect expressed emotions; rather it predicts the perception of facially expressed emotions.
The validity of emotion AI has been heavily scrutinized and often raises ethical concerns. Advocacy groups including the AI Now Institute and Brookings Institution have called for bans on the technology for certain use cases.
After Protocol reported that virtual meeting platform Zoom was interested in using emotion recognition, more than 25 human and digital rights organizations including the American Civil Liberties Union, Electronic Privacy Information Center and Fight for the Future demanded that the company end any plans to use it.
Google declined an interview for this story, instead pointing to a 2021 Reuters report that explained that, following an internal ethics review, the company decided against including new capabilities in its Cloud Vision API tool to detect the likelihood of additional emotions other than anger, joy, sorrow and surprise. According to the story, the group “determined that inferring emotions could be insensitive because facial cues are associated differently with feelings across cultures, among other reasons.”
Mitchell told Protocol that during her time working for Google, she was part of the group that helped convince the company not to expand Cloud Vision API’s features to infer additional emotional states other than the original four.
Mitchell, who co-led Google’s ethical AI team, was fired from the company in February 2021 following a company investigation into violations of security policies for moving company files. Her departure followed another high-profile firing of her AI ethics team co-lead Timnit Gebru. Gebru was fired in part as a result of conflict over a research paper questioning the environmental, financial and societal costs of large-language machine-learning models.
It is unclear whether Google’s decision to limit the tool to four emotions avoids inaccuracies. For instance, when one researcher tested Google’s Cloud Vision API in 2019, he applied the tool to assess the sentiments of faces of a group of children in a photo. In the moment the snapshot was taken, everyone but one boy was smiling. The system appeared to default to something that may have been incorrect, determining that the boy’s face was expressing “sorrow with a confidence of 78%.”
Reductive accessibility
Researchers are pushing to advance emotion AI. At the international Computer Vision and Pattern Recognition conference held in New Orleans in June, some accepted research papers involved work related to facial expression recognition and facial landmark detection, for example.
“We’re just breaking the surface, and everywhere I turn there’s more and more [emotion AI] developing,” said Nirit Pisano, chief psychology officer at Cognovi Labs, which provides emotion AI technology to advertisers and pharmaceutical-makers that use it to determine responses to marketing messages and to understand how people feel about certain drugs.
“I definitely see its uses, and I also envision many of its misuses. I think the mission of a company is really critical,” Pisano said.
Seeing AI app - Scene Channelwww.youtube.com
Microsoft said its decision to continue use of emotion recognition in Seeing AI will help advance its accessibility mission. “Microsoft remains committed to supporting technology for people with disabilities and will continue to use these capabilities in support of this goal by integrating them into applications such as Seeing AI,” wroteSarah Bird, principal group product manager at Microsoft’s Azure AI, in a company blog post last month.
Gebru, who has a Ph.D. in computer vision, is critical of emotion recognition technology, which uses computer vision to detect facial data. She told Protocol that although “there are many times where access is used as a reason” for emotion recognition — such as to improve accessibility for people with vision impairment — whether it can be beneficial “all depends on what people in that community have said.”
I definitely see its uses, and I also envision many of its misuses. I think the mission of a company is really critical.
Seeing AI is only accessible to people with Apple devices, even though people have requested that the company create an Android version of the app. “Currently on the Android side, there are alternatives such as Speak, [Supersense], Envision AI, Kibo etc, but that’s no excuse for not having Seeing AI,” wrote a Reddit user in the R/Blind channel two years ago.
The Seeing AI app gets positive reviews online; however, rather than using it to detect emotion, some people seem more interested in using it to accomplish tasks such as deciphering the denomination of paper money, helping to read mail and determining whether food in the fridge has expired.
Cristian Sainz uses Seeing AI at home to scan the bar code of a jar of peaches from his fridge.
Photo: Microsoft
Still, tools such as Seeing AI can help someone who cannot see navigate a conversation by picking up on cues they miss when people only nod or make facial expressions rather than audibly communicating.
“Deafblind people face higher levels of depression in part because ableist barriers often exclude us from conversations,” wrote Haben Girma, a deafblind human rights lawyer, in a June tweet noting the benefits of providing details about imagery that can help people “receive the emotional message through words.”
Even if only used to assist people with vision loss, Mitchell said there could be better ways to build emotion recognition AI. Labeling facial expressions to indicate emotions that are then spoken by an app’s computerized voice may not be the most helpful approach, she said, suggesting that things like electronic pulses or tones could be used instead to convey the visible facial expressions in a way that could be more clearly understood.
“It doesn’t actually need to be the case for blind people that they need to have this point of discrete categorization,” Mitchell said. “It seems to be an undesirable bottleneck and a reductive form of signal processing that doesn’t actually need to be there for someone who is blind.”