Entertainment

How Netflix tests Netflix: The story behind the service’s new two-thumbs-up feature

Netflix designers thought they had the perfect icon for a new feature. Then the service’s subscribers started chiming in.

How Netflix tests Netflix: The story behind the service’s new two-thumbs-up feature

What if viewers are really, truly in love with a show?

GIF: Netflix

For nearly five years, Netflix has had simple thumbs-up and thumbs-down icons to express viewing preferences and help its algorithms provide better recommendations. However, in surveys, people frequently expressed that this binary type of voting didn’t really do their taste justice.

What if they were really, truly in love with a show?

Tasked to come up with a better way to express such levels of adoration, the streaming service recently explored the idea of adding a heart icon to the Netflix app. The heart seemed like an obvious choice. It’s a universal sign of love, and widely used in apps like Instagram and Twitter.

But Netflix wouldn’t be Netflix if the company didn’t put features like these through some rigorous testing; in this case, it took nearly a year. During that time, the company discovered that hearts were actually not the best-performing feature after all, and instead settled on a new two-thumbs-up option that is being made available to its subscribers worldwide this week.

Here’s how that change of heart came about.

Finding a universal symbol for love

Netflix rolled out its new two-thumbs-up feature across its mobile and smart TV apps as well as its website Monday. Subscribers are being advised that this type of feedback directly affects future recommendations. A thumbs-down means that a title won’t get suggested again; a thumbs-up will result in Netflix recommending similar content. Two thumbs up means that “we know you’re a true fan,” as the Netflix mobile app puts it.

The company kicked off its work on the feature about a year and a half ago based on feedback it was getting in surveys and research interviews from its subscribers. “We were hearing from members that ‘like’ and ‘dislike’ was not sufficient,” said Christine Doig-Cardet, who leads the company’s personalized UI product innovation team. “There were some shows that they really, really, really enjoyed. Differentiating between what they love and what they like was important.”

Once the decision was made to solve this problem, Netflix kicked off a series of design sprints to come up with visuals for this level of fandom. Some of the early ideas included the heart, an applause icon, shooting stars and others. Designers also consulted with the company’s globalization team to find an icon that was truly universal. “The design team and the globalization team really [homed] in on the symbols that connote love,” said Netflix director of Product Design Ratna Desai. “We wanted it to be very precise, very concise, because we wanted this to be a very quick interaction.”

Image: Netflix
Netflix tested a number of different reactions that could reflect a viewer's interest in a show.

At the same time, Netflix continued to query its subscribers, who had a different suggestion. “We had a lot of interviews and surveys, [and] the heart was not really resonating,” Doig-Cardet said. “The idea that came from members was: Why don’t you just try two thumbs up?”

At that point, two front-runners emerged. The heart seemed like an obvious choice, but two thumbs up also seemed to work well with Netflix’s existing iconography. Plus, as anyone who has ever read a review by the late Roger Ebert knows, it has long signified a vote of confidence for great entertainment.

Going with what its subscribers wanted seemed like a good idea, giving credence to the two thumbs up. But what if those subscribers were wrong?

“Some people can speak loudly,” Doig-Cardet said. “But when you look at the whole picture, talk to a lot of different members and see how they engage with the different features, it doesn't actually always [match] the initial loud voices.”

Proving the loudest voices wrong

Netflix has long tried to figure out how to best collect member-based content ratings, and dealing with those loud voices has been challenging. In its early days, Netflix used to offer a five-star ratings system, similar to the way people rate their Uber drivers.

At the time, Netflix displayed an average of those ratings on its website to convey how well-liked a title was among subscribers. This resulted in some titles having 4.5 stars, or other fractions, leading people to wonder why they couldn’t rate in half-star increments as well.

Thousands of people told the company in surveys that they wanted this level of granularity, but Netflix employees weren’t sure whether those opinions reflected how people actually used the service. To make sure it wasn’t falling for the opinions of a vocal minority, Netflix resorted to something that has become a key part of its product development tool chest over the years: an A/B test.

In the case of the half-star test, the results were obvious: Ratings dropped significantly when people were asked to provide feedback with that level of granularity. In other words: A/B testing proved the loudest voices wrong.

Netflix repeated this kind of testing when it completely replaced the five-star ratings with thumbs in 2017. In A/B tests ahead of that change, the company saw ratings activity increase by 200% with thumbs-up and thumbs-down icons. Part of the issue was that these icons were just simpler, but a closer look at the data also revealed that they tended to be more accurate: People would aspirationally rate titles five stars that they deemed worthy of that status, including award-winning documentaries that would then linger unwatched in their queues for months. At the same time, they would frequently binge on reality TV shows that they themselves had rated just three stars.

The moment of truth: Hearts or thumbs?

Now, Netflix is ready to again add a bit more complexity to those ratings. That’s in part because media consumption habits and app interfaces have changed across the board. “People are using Netflix in the context of their overall lives,” Desai said. “They are interacting with Instagram, with various social networks, with ride-share apps.” Some of the interaction patterns of those apps and experiences weren’t easily applicable to Netflix, which is primarily used on TVs and has a much bigger focus on leanback entertainment than, for instance, Instagram. “But there are a few levers that our members are now asking for that they didn't in the past,” she said.

Still, there were some unresolved questions, including what would perform better: Hearts or thumbs? And would either actually have a lasting impact beyond addressing those loud voices in surveys and other forms of qualitative research?

“We have been in situations where we may hear very strong points of view in a qualitative setting that go against what we find out in A/B testing,” Desai said. ”That's when the fun begins.”

Netflix two-thumbs-up feature Netflix began a series of A/B tests for the new ratings feature last summer.Image: Netflix.

Netflix began a series of A/B tests for the new ratings feature last summer, trialing both the heart and the two-thumbs-up option. At the same time, the company continued to query subscribers, including those enrolled in the tests, to see whether the new features were actually providing value.

Testing of the feature extended into the fall, as the teams working on it wanted to make sure they got things right. “We don't rush a test,” Doig-Cardet said. “Sometimes, there's this impetus to just launch early and break things and all of that. That's not [our] approach.” One reason for conducting A/B tests over weeks or even months is to let people get used to a feature and see whether engagement stays high, or whether people are attracted to the novelty of a feature, and then get bored with it.

In the end, the numbers were clear: Providing additional feedback worked. “We saw a very big lift in engagement because people had a new way to talk to us,” Desai said. That lift was a lot bigger with the two thumbs up than with the heart, which was a surprise, as people within Netflix had expected the heart to win.

Those kinds of unexpected outcomes are what make A/B testing so valuable, Doig-Cardet said. “If we weren't surprised, we would be doing something wrong,” she said. “We would be validating our own assumptions, rather than letting numbers direct what is a better experience.”

Constant testing, even if it can spoil the big reveal

Netflix’s extensive use of A/B testing has been well-documented over the years, including by its own data science team. The company is constantly testing a number of different features with subsets of its audience. Basically, if you’re a Netflix subscriber, there’s a decent chance that you are enrolled in some kind of test right now.

Some of these tests are for obvious interface tweaks, and some are related to under-the-hood codec or infrastructure changes. In fact, Netflix does so many tests that members can be enrolled in more than one test at the same time, which is why the company developed an entire experimentation platform that helps its data science team avoid testing conflicts and make sense of all the collected data. (Netflix does offer members a chance to opt out of tests through their account settings.)

However, the development of the new two-thumbs-up feature also shows that A/B testing alone isn’t enough. Without also talking directly to subscribers, the company would have prioritized the development of the heart icon and wouldn’t have given two thumbs up a chance to prove itself in A/B tests. “We take this multipronged approach of looking at a lot of different inputs,” Doig-Cardet said. “We're capturing insights from our customer service, from surveys, from interviews that we're doing, and using all of that to inform [what] we should be investing in and testing.”

Both surveys and A/B tests do come with a risk of exposing future features to the public eye. Subscribers frequently post about new things they spotted in the app, and reporters tend to jump on those stories to shine a light on the company’s roadmap. For Netflix, that’s just a cost of doing business. “We're comfortable making that trade-off of providing early visibility because we want to make sure that it's working for our members,” Doig-Cardet said.

“In previous places I worked, there's this amazing unveiling of the feature, with the campaign and all of that,” Desai added. Netflix instead operates a bit more in the open, which includes testing new and unannounced features with tens of thousands of members.

“This is our bread and butter,” Desai said. “It's our secret sauce to how we innovate.”

Fintech

How I decided to exit my startup’s original business

Bluevine got its start in factoring invoices for small businesses. CEO Eyal Lifshitz explains why it dropped that business in favor of “end-to-end banking.”

"[I]t was a realization that we can't be successful at both at the same time: You've got to choose."

Photo: Bluevine

Click banner image for more How I decided series

Bluevine got its start in fintech by offering a modern version of invoice factoring, the centuries-old practice where businesses sell off their accounts receivable for up-front cash. It’s raised $767 million in venture capital since its founding in 2013 by serving small businesses. But along the way, it realized it was better to focus on the checking accounts and lines of credit it provided customers than its original product. It now manages some $500 million in checking-account deposits.

Keep Reading Show less
Ryan Deffenbaugh
Ryan Deffenbaugh is a reporter at Protocol focused on fintech. Before joining Protocol, he reported on New York's technology industry for Crain's New York Business. He is based in New York and can be reached at rdeffenbaugh@protocol.com.

Businesses are evolving, with current events and competition serving as the catalysts for technology adoption. Events from the pandemic to the ongoing war in Ukraine have exposed the fragility of global supply chains. The topic of sustainability is now on every board room agenda. Industries from manufacturing to retail and everything in between are exploring the latest innovations like process automation, machine learning and AI to identify potential safeguards against future disruption. But according to a recent survey from Boston Consulting Group, while 80% of companies are adopting digital solutions to navigate existing business challenges or opportunities like the ones mentioned, only about 30% successfully digitally transform their business.

For the last 50 years, SAP has worked closely with our customers to solve some of the world’s most intricate problems. We have also seen, and have been a part of, rapid accelerations in technology in response. Across industries, certain paths have emerged to help businesses manage the unexpected challenges over the last few years.

Keep Reading Show less
DJ Paoni

DJ Paoni is the President of SAP North America and is responsible for the strategy, day-to-day operations, and overall customer success in the United States and Canada. Dedicated to helping customers become best-run businesses, DJ has established himself as a trusted advisor who places a high priority on their success. He works with many of SAP North America's 155,000 customers and helps them adopt business and technology best practices across 25 different industries.

Enterprise

The Roe decision could change how advertisers use location data

Over the years, the digital ad industry has been resistant to restricting use of location data. But that may be changing.

Over the years, the digital ad industry has been resistant to restrictions on the use of location data. But that may be changing.

Illustration: Christopher T. Fong/Protocol

When the Supreme Court overturned Roe v. Wade on Friday, the likelihood for location data to be used against people suddenly shifted from a mostly hypothetical scenario to a realistic threat. Although location data has a variety of purposes — from helping municipalities assess how people move around cities to giving reliable driving directions — it’s the voracious appetite of digital advertisers for location information that has fueled the creation and growth of a sector selling data showing who visited specific points on the map, when, what places they came from and where they went afterwards.

Over the years, the digital ad industry has been resistant to restrictions on the use of location data. But that may be changing. The overturning of Roe not only puts the wide availability of location data for advertising in the spotlight, it could serve as a turning point compelling the digital ad industry to take action to limit data associated with sensitive places before the government does.

Keep Reading Show less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Enterprise

Russian cyberattacks against the US may still be coming, experts say

In response to strong sanctions and military aid to Ukraine, Russia was expected to launch disruptive cyberattacks against the West but never did. But a cyberescalation from Russia still remains possible, as soon as later this year, according to experts.

"I fear this is a 'calm before the storm' situation," said Chester Wisniewski, principal research scientist at Sophos.

Illustration: Nanzeeba Ibnat/iStock/Getty Images Plus

In the four months since its invasion of Ukraine, Russia hasn't intensified its usual pattern of cyberattacks against the U.S. and Western Europe in response to sanctions and Ukrainian military aid, as many expected. But that doesn't mean the risk of escalation with the West is gone, numerous experts told Protocol.

In other words, don't lower your shields just yet.

Keep Reading Show less
Kyle Alspach

Kyle Alspach ( @KyleAlspach) is a senior reporter at Protocol, focused on cybersecurity. He has covered the tech industry since 2010 for outlets including VentureBeat, CRN and the Boston Globe. He lives in Portland, Oregon, and can be reached at kalspach@protocol.com.

Fintech

Affirm CEO: 'Buy now, pay later' becomes more attractive in a slump

With consumers grappling with rising rates and prices, the question of whether they’ll still buy now and pay later is open. Max Levchin thinks Affirm knows the answer.

Affirm CEO Max Levchin spoke with Protocol about "buy now, pay later."

Photo: John Lamparski/Getty Images

Shortly after Affirm went public last year, CEO Max Levchin told Protocol that he saw “an ocean of opportunities” for the “buy now, pay later” pioneer. Wall Street agreed.

Affirm’s stock soared in its trading debut as the company blazed a trail for a fast-growing alternative to the credit cards that Levchin says consumers are increasingly rejecting.

Keep Reading Show less
Benjamin Pimentel

Benjamin Pimentel ( @benpimentel) covers crypto and fintech from San Francisco. He has reported on many of the biggest tech stories over the past 20 years for the San Francisco Chronicle, Dow Jones MarketWatch and Business Insider, from the dot-com crash, the rise of cloud computing, social networking and AI to the impact of the Great Recession and the COVID crisis on Silicon Valley and beyond. He can be reached at bpimentel@protocol.com or via Google Voice at (925) 307-9342.

Latest Stories
Bulletins