Amid debate, Microsoft and Google continue to use emotion-detection AI, with limits

Microsoft said accessibility goals overrode problems with emotion recognition and Google offers off-the-shelf emotion recognition technology amid growing concern over the controversial AI.

“Seeing AI becomes my assistant and provides me with support that I need so that I can focus on taking care of my daughter,” says Akiko Ishii, with her daughter, Ami, in Tokyo. Photo by Microsoft.

Emotion recognition is a well-established field of computer vision research; however, AI-based technologies used in an attempt to assess people’s emotional states have moved beyond the research phase.

Photo: Microsoft

Microsoft said last month it would no longer provide general use of an AI-based cloud software feature used to infer people’s emotions. However, despite its own admission that emotion recognition technology creates “risks,” it turns out the company will retain its emotion recognition capability in an app used by people with vision loss.

In fact, amid growing concerns over development and use of controversial emotion recognition in everyday software, both Microsoft and Google continue to incorporate the AI-based features in their products.

“The Seeing AI person channel enables you to recognize people and to get a description of them, including an estimate of their age and also their emotion,” said Saqib Shaikh, a software engineering manager and project lead for Seeing AI at Microsoft who helped build the app, in a tutorial about the product in a 2017 Microsoft video.

After he snapped a photo of his friend using Seeing AI, the app’s automated voice announced that he is a “36-year-old male wearing glasses, looking happy.” Shaikh added, “That’s really cool because at that moment in time you can find out what someone’s facial expression was.”

Microsoft said on June 21 that it will “retire” its facial analysis capabilities that attempt to detect people’s emotional states, gender, age and other attributes. The company pointed to privacy concerns, “the lack of consensus on a definition of ‘emotions’” and the “inability to generalize the linkage between facial expression and emotional state across use cases, regions, and demographics.”

But accessibility goals overrode those problems when it came to Seeing AI. “We worked alongside people from the blind and low vision community who provided key feedback that the emotion recognition feature is important to them, in order to close the equity gap between them and [the] experience of sighted individuals,” said Microsoft in a statement sent to Protocol. The company declined a request for an interview.

“I really do appreciate Microsoft’s nuance here,” said Margaret Mitchell, chief ethics scientist and researcher at Hugging Face and a Ph.D. in computer science who helped develop Seeing AI in 2014 while working at Microsoft. She left the company in 2016. “When you talk to people who are blind you will see there is absolutely an appreciation for description of visual scenes,” Mitchell said.

Saqib Shaikh, a Microsoft software engineering manager and project lead for Seeing AI with Microsoft CEO Satya Nadella Saqib Shaikh, a Microsoft software engineering manager and project lead for Seeing AI (left) with Microsoft CEO Satya Nadella Photo: Justin Sullivan/Getty Images

Emotion recognition is a well-established field of computer vision research; however, AI-based technologies used in an attempt to assess people’s emotional states have moved beyond the research phase. They have been integrated into everyday tech products like virtual meeting platforms, online classroom platforms and software in vehicles used to detect driver distraction or road rage.

Off-the-shelf emotion detection from Google

Google has also grappled with decisions about incorporating computer vision-based AI that attempts to gauge the likelihood that a person is expressing certain emotions or facial characteristics.

The company’s Cloud Vision API includes “pre-trained Vision API models to detect emotion, understand text, and more,” according to a company description. The system rates the likelihood that a face in an image is expressing anger, joy, sorrow and surprise on a scale from “unknown” or “very unlikely” to “very likely.”

Google also includes a feature in its ML Kit tool for mobile apps that detects facial “landmarks” and classifies facial characteristics to show whether or not someone has their eyes open or is smiling.

Even though its own documentation sometimes claims its software “detects emotion,” a Google spokesperson played down that idea, noting that its Vision API does not detect expressed emotions; rather it predicts the perception of facially expressed emotions.

The validity of emotion AI has been heavily scrutinized and often raises ethical concerns. Advocacy groups including the AI Now Institute and Brookings Institution have called for bans on the technology for certain use cases.

After Protocol reported that virtual meeting platform Zoom was interested in using emotion recognition, more than 25 human and digital rights organizations including the American Civil Liberties Union, Electronic Privacy Information Center and Fight for the Future demanded that the company end any plans to use it.

Google declined an interview for this story, instead pointing to a 2021 Reuters report that explained that, following an internal ethics review, the company decided against including new capabilities in its Cloud Vision API tool to detect the likelihood of additional emotions other than anger, joy, sorrow and surprise. According to the story, the group “determined that inferring emotions could be insensitive because facial cues are associated differently with feelings across cultures, among other reasons.”

Mitchell told Protocol that during her time working for Google, she was part of the group that helped convince the company not to expand Cloud Vision API’s features to infer additional emotional states other than the original four.

Mitchell, who co-led Google’s ethical AI team, was fired from the company in February 2021 following a company investigation into violations of security policies for moving company files. Her departure followed another high-profile firing of her AI ethics team co-lead Timnit Gebru. Gebru was fired in part as a result of conflict over a research paper questioning the environmental, financial and societal costs of large-language machine-learning models.

It is unclear whether Google’s decision to limit the tool to four emotions avoids inaccuracies. For instance, when one researcher tested Google’s Cloud Vision API in 2019, he applied the tool to assess the sentiments of faces of a group of children in a photo. In the moment the snapshot was taken, everyone but one boy was smiling. The system appeared to default to something that may have been incorrect, determining that the boy’s face was expressing “sorrow with a confidence of 78%.”

Reductive accessibility

Researchers are pushing to advance emotion AI. At the international Computer Vision and Pattern Recognition conference held in New Orleans in June, some accepted research papers involved work related to facial expression recognition and facial landmark detection, for example.

“We’re just breaking the surface, and everywhere I turn there’s more and more [emotion AI] developing,” said Nirit Pisano, chief psychology officer at Cognovi Labs, which provides emotion AI technology to advertisers and pharmaceutical-makers that use it to determine responses to marketing messages and to understand how people feel about certain drugs.

“I definitely see its uses, and I also envision many of its misuses. I think the mission of a company is really critical,” Pisano said.

Seeing AI app - Scene Channel www.youtube.com

Microsoft said its decision to continue use of emotion recognition in Seeing AI will help advance its accessibility mission. “Microsoft remains committed to supporting technology for people with disabilities and will continue to use these capabilities in support of this goal by integrating them into applications such as Seeing AI,” wrote Sarah Bird, principal group product manager at Microsoft’s Azure AI, in a company blog post last month.

Gebru, who has a Ph.D. in computer vision, is critical of emotion recognition technology, which uses computer vision to detect facial data. She told Protocol that although “there are many times where access is used as a reason” for emotion recognition — such as to improve accessibility for people with vision impairment — whether it can be beneficial “all depends on what people in that community have said.”

I definitely see its uses, and I also envision many of its misuses. I think the mission of a company is really critical.

Seeing AI is only accessible to people with Apple devices, even though people have requested that the company create an Android version of the app. “Currently on the Android side, there are alternatives such as Speak, [Supersense], Envision AI, Kibo etc, but that’s no excuse for not having Seeing AI,” wrote a Reddit user in the R/Blind channel two years ago.

The Seeing AI app gets positive reviews online; however, rather than using it to detect emotion, some people seem more interested in using it to accomplish tasks such as deciphering the denomination of paper money, helping to read mail and determining whether food in the fridge has expired.

Cristian Sainz uses Seeing AI at home to scan the bar code of a jar of peaches from his fridge. Photo by Microsoft. Cristian Sainz uses Seeing AI at home to scan the bar code of a jar of peaches from his fridge. Photo: Microsoft

Still, tools such as Seeing AI can help someone who cannot see navigate a conversation by picking up on cues they miss when people only nod or make facial expressions rather than audibly communicating.

“Deafblind people face higher levels of depression in part because ableist barriers often exclude us from conversations,” wrote Haben Girma, a deafblind human rights lawyer, in a June tweet noting the benefits of providing details about imagery that can help people “receive the emotional message through words.”

Even if only used to assist people with vision loss, Mitchell said there could be better ways to build emotion recognition AI. Labeling facial expressions to indicate emotions that are then spoken by an app’s computerized voice may not be the most helpful approach, she said, suggesting that things like electronic pulses or tones could be used instead to convey the visible facial expressions in a way that could be more clearly understood.

“It doesn’t actually need to be the case for blind people that they need to have this point of discrete categorization,” Mitchell said. “It seems to be an undesirable bottleneck and a reductive form of signal processing that doesn’t actually need to be there for someone who is blind.”


What the fate of 9 small tokens means for the crypto industry

The SEC says nine tokens in the Coinbase insider trading case are securities, but they are similar to many other tokens that are already trading on exchanges.

While a number of pieces of crypto legislation have been introduced in Congress, the SEC’s moves in court could become precedent until any legislation is passed or broader executive actions are made.

Illustration: Christopher T. Fong/Protocol

When the SEC accused a former Coinbase employee of insider trading last month, it specifically named nine cryptocurrencies as securities, potentially opening the door to regulation for the rest of the industry.

If a judge agrees with the SEC’s argument, many other similar tokens could be deemed securities — and the companies that trade them could be forced to be regulated as securities exchanges. When Ripple was sued by the SEC last year, for example, Coinbase chose to suspend trading the token rather than risk drawing scrutiny from federal regulators. In this case, however, Coinbase says the nine tokens – seven of which trade on Coinbase — aren’t securities.

Keep Reading Show less
Tomio Geron

Tomio Geron ( @tomiogeron) is a San Francisco-based reporter covering fintech. He was previously a reporter and editor at The Wall Street Journal, covering venture capital and startups. Before that, he worked as a staff writer at Forbes, covering social media and venture capital, and also edited the Midas List of top tech investors. He has also worked at newspapers covering crime, courts, health and other topics. He can be reached at tgeron@protocol.com or tgeron@protonmail.com.

Sponsored Content

They created Digital People. Now, they’ve made celebrities available as Digital Twins

Protocol talks to Soul Machines’ CEO about the power of AI in the metaverse

Keep Reading Show less
David Silverberg
David Silverberg is a Toronto-based freelance journalist, editor and writing coach. He writes for The Washington Post, BBC News, Business Insider, The Toronto Star, New Scientist, Fodor's, and several alumni magazines. He also writes for brands such as 23andme, Shopify and Bold Commerce. He has served as editor of B2B News Network, Canada's only B2B news magazine, and Digital Journal, a leading pioneer in citizen journalism. Find more about him at www.davidsilverberg.ca

Werner Vogels: Enterprises are more daring than you might think

The longtime chief technology officer talked with Protocol about the AWS customers that first flocked to serverless, how AI and ML are making life easier for developers and his “primitives, not frameworks” stance.

"We knew that if cloud would really be effective, development would change radically."

Photo: Amazon

When AWS unveiled Lambda in 2014, Werner Vogels thought the serverless compute service would be the domain of young, more tech-savvy businesses.

But it was enterprises that flocked to serverless first, Amazon’s longtime chief technology officer told Protocol in an interview last week.

Keep Reading Show less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.


Dark money is trying to kill the Inflation Reduction Act from the left

A new campaign is using social media to target voters in progressive districts to ask their representatives to vote against the Inflation Reduction Act. But it appears to be linked to GOP operatives.

United for Clean Power's campaign is a symptom of how quickly and easily social media allows interest groups to reach a targeted audience.

Photo: Anna Moneymaker/Getty Images

The social media feeds of progressive voters have been bombarded by a series of ads this past week telling them to urge their Democratic representatives to vote against the Inflation Reduction Act.

The ads aren’t from the Sunrise Movement or other progressive climate stalwarts, though. Instead, they’re being pushed by United for Clean Power, a murky dark money operation that appears to have connections with Republican operatives.

Keep Reading Show less
Lisa Martine Jenkins

Lisa Martine Jenkins is a senior reporter at Protocol covering climate. Lisa previously wrote for Morning Consult, Chemical Watch and the Associated Press. Lisa is currently based in Brooklyn, and is originally from the Bay Area. Find her on Twitter ( @l_m_j_) or reach out via email (ljenkins@protocol.com).


A game that lets you battle Arya Stark and LeBron James? OK!

Don’t know what to do this weekend? We’ve got you covered.

Image: Toho; Warner Bros. Games; Bloomberg

This week we’re jumping into an overnight, free-to-play brawler; one of the best Japanese dubs we’ve heard in a while; and a look inside a fringe subculture of anarchists.

Keep Reading Show less
Nick Statt

Nick Statt is Protocol's video game reporter. Prior to joining Protocol, he was news editor at The Verge covering the gaming industry, mobile apps and antitrust out of San Francisco, in addition to managing coverage of Silicon Valley tech giants and startups. He now resides in Rochester, New York, home of the garbage plate and, completely coincidentally, the World Video Game Hall of Fame. He can be reached at nstatt@protocol.com.

Latest Stories