The FTC’s 'profoundly vague' plan to force companies to destroy algorithms could get very messy

Companies take algorithms out of production all the time. But wiping an AI model and the data that built it off the face of the earth could be a lot more challenging.

Government trigger to blow up a safe of algorithms with dynamite

Algorithmic systems using AI and machine or deep learning can involve large models or families of models involving extremely complex logic expressed in code.

Illustration: CreepyCube/iStock/Getty Images Plus; Protocol

“The premise is simple,” FTC Commissioner Rebecca Slaughter and FTC lawyers wrote last year.

They were talking about a little-used enforcement tool called algorithmic disgorgement, a penalty the agency can wield against companies that used deceptive data practices to build algorithmic systems like AI and machine-learning models. The punishment: They have to destroy ill-gotten data and the models built with it. But while privacy advocates and critics of excessive data collection are praising the concept in theory, in practice it could be anything but simple to implement.

In fact, separating tainted data and algorithmic systems from the unaffected parts of a company’s technology products and intellectual property could be about as easy as teasing out a child’s tangled hair.

“Once you delete the algorithm, you delete the learning. But if it’s entangled with other data, it can get complex pretty fast,” said Rana el Kaliouby, a machine-learning scientist and deputy CEO of driver-monitoring AI firm Smart Eye.

The FTC’s March 3 settlement order against WW, the company formerly known as Weight Watchers, marked the most recent time the agency has demanded a company destroy algorithmic systems. As part of its settlement, the company also must delete data gathered deceptively, provide a written statement confirming deletion sworn under penalty of perjury and keep records for 10 years demonstrating compliance.

But the order provides little detail about how the company must comply or how the FTC will know for sure it did.

The order is “profoundly vague,” said Pam Dixon, executive director of World Privacy Forum. “We’re not usually talking about a single algorithm. I would like to have seen more in their materials about what it is that is being disgorged specifically.”

For example, she said it is unclear whether WW used the data the FTC wants it to delete for marketing, for machine-learning models to predict or score kids’ health status or for other purposes.

How to kill an algorithm

Companies decommission algorithmic models by taking them out of production all the time. In some cases, an algorithm is just a simple piece of code: something that tells a software application how to perform a set of actions.

If WW used the data it was ordered to delete to build just one machine-learning model used in one particular feature of its app, for example, deleting the code for that feature could be a relatively straightforward process, el Kaliouby said.

But algorithmic systems using AI and machine or deep learning can involve large models or families of models involving extremely complex logic expressed in code. Algorithmic systems used in social media platforms, for example, might incorporate several different intersecting models and data sets all working together.

In any case, the first step does involve taking the model out of operation. This ensures that it will no longer process data or ingest new data.

But it’s not so easy to decouple data from algorithmic systems, in part because data used to train and feed them hardly ever sits in one place. Data obtained through deceptive means may end up in a data set that is then sliced and diced to form multiple data set “splits,” each used for separate purposes throughout the machine-learning model development process for model training, testing and validation, said Anupam Datta, co-founder and chief scientist at TruEra, which provides a platform for explaining and monitoring AI models.

And once a model has been deployed, it might blend ill-gotten data along with additional information from other sources, such as data ingested through APIs or real-time data streams.

Explosion of algorithm safe The first step in killing an algorithm involves taking the model out of operationIllustration: CreepyCube/iStock/Getty Images Plus; Protocol

Nowadays, data is often managed in the cloud. Cloud providers like AWS, Azure or Google Cloud offer standardized ways to delete data. A data scientist could use a tool from a cloud platform to mark which data needs to be deleted at varying levels of granularity, Datta said.

When the data storage area for that particular data is marked for removal, the space is freed up, allowing the system to write over or replace that doomed data with new information. However, in that case, the data that was intended to be deleted could still be recovered, Datta said.

Cryptographic erasure could be used to delete data more permanently, he said. The process encrypts the data record with an encoded key that is itself deleted, like locking the data in a box and throwing away the key.

The data replicant problem

In addition to data blending, data copying adds more layers of complexity to the removal process. Data often is replicated and distributed so it can be accessed or used by multiple people or for multiple purposes.

Krishnaram Kenthapadi, chief scientist at machine-learning model monitoring company Fiddler, called this problem — deleting algorithmic models built with ill-gotten information — one of data provenance. It requires an understanding of how data gleaned through deceptive means has moved or been processed within a complex data ecosystem from the time the data was originally collected.

“You want to track all the downstream applications that touched or may have used this data,” he said.

Inspired in part by Europe’s General Data Protection Regulation, which gives people the right to demand that companies delete their personal data, today’s cloud platforms, data management software and technologies used for building and operationalizing AI and machine-learning models — sold by companies such as AWS, C3.ai, Dataiku, Databricks, Dremio, Google Cloud, Informatica, Matillion and others — provide tools that help companies keep track of data lineage to know where data came from, when it was copied for backup or multiple uses and where those copies moved over time.

Without those sorts of tools in place, though, it could be difficult for a company to know for sure whether every copy has actually been deleted. “You might still have some copies left over that are unaccounted for,” said Datta.

Many companies do not have processes set up to automatically attach lineage information to data they collect and use in building algorithmic systems, said Kevin Campbell, CEO of Syniti, a company that provides data technologies and services for things like data migration and data quality.

“If you don’t have a centralized way of capturing that information, you have to have a whole bunch of people chase it down,” said Campbell. “A whole lot of people are going to write a lot of queries.”

As data use and AI become increasingly complex, monitoring for compliance could be difficult for regulators, said el Kaliouby. “It’s not impossible,” she said, but “it’s just hard to enforce some of these things, because you have to be a domain expert.”


Why foundation models in AI need to be released responsibly

Foundation models like GPT-3 and DALL-E are changing AI forever. We urgently need to develop community norms that guarantee research access and help guide the future of AI responsibly.

Releasing new foundation models doesn’t have to be an all or nothing proposition.

Illustration: sorbetto/DigitalVision Vectors

Percy Liang is director of the Center for Research on Foundation Models, a faculty affiliate at the Stanford Institute for Human-Centered AI and an associate professor of Computer Science at Stanford University.

Humans are not very good at forecasting the future, especially when it comes to technology.

Keep Reading Show less
Percy Liang
Percy Liang is Director of the Center for Research on Foundation Models, a Faculty Affiliate at the Stanford Institute for Human-Centered AI, and an Associate Professor of Computer Science at Stanford University.

Every day, millions of us press the “order” button on our favorite coffee store's mobile application: Our chosen brew will be on the counter when we arrive. It’s a personalized, seamless experience that we have all come to expect. What we don’t know is what’s happening behind the scenes. The mobile application is sourcing data from a database that stores information about each customer and what their favorite coffee drinks are. It is also leveraging event-streaming data in real time to ensure the ingredients for your personal coffee are in supply at your local store.

Applications like this power our daily lives, and if they can’t access massive amounts of data stored in a database as well as stream data “in motion” instantaneously, you — and millions of customers — won’t have these in-the-moment experiences.

Keep Reading Show less
Jennifer Goforth Gregory
Jennifer Goforth Gregory has worked in the B2B technology industry for over 20 years. As a freelance writer she writes for top technology brands, including IBM, HPE, Adobe, AT&T, Verizon, Epson, Oracle, Intel and Square. She specializes in a wide range of technology, such as AI, IoT, cloud, cybersecurity, and CX. Jennifer also wrote a bestselling book The Freelance Content Marketing Writer to help other writers launch a high earning freelance business.

The West’s drought could bring about a data center reckoning

When it comes to water use, data centers are the tech industry’s secret water hogs — and they could soon come under increased scrutiny.

Lake Mead, North America's largest artificial reservoir, has dropped to about 1,052 feet above sea level, the lowest it's been since being filled in 1937.

Photo: Mario Tama/Getty Images

The West is parched, and getting more so by the day. Lake Mead — the country’s largest reservoir — is nearing “dead pool” levels, meaning it may soon be too low to flow downstream. The entirety of the Four Corners plus California is mired in megadrought.

Amid this desiccation, hundreds of the country’s data centers use vast amounts of water to hum along. Dozens cluster around major metro centers, including those with mandatory or voluntary water restrictions in place to curtail residential and agricultural use.

Keep Reading Show less
Lisa Martine Jenkins

Lisa Martine Jenkins is a senior reporter at Protocol covering climate. Lisa previously wrote for Morning Consult, Chemical Watch and the Associated Press. Lisa is currently based in Brooklyn, and is originally from the Bay Area. Find her on Twitter ( @l_m_j_) or reach out via email (ljenkins@protocol.com).


Indeed is hiring 4,000 workers despite industry layoffs

Indeed’s new CPO, Priscilla Koranteng, spoke to Protocol about her first 100 days in the role and the changing nature of HR.

"[Y]ou are serving the people. And everything that's happening around us in the world is … impacting their professional lives."

Image: Protocol

Priscilla Koranteng's plans are ambitious. Koranteng, who was appointed chief people officer of Indeed in June, has already enhanced the company’s abortion travel policies and reinforced its goal to hire 4,000 people in 2022.

She’s joined the HR tech company in a time when many other tech companies are enacting layoffs and cutbacks, but said she sees this precarious time as an opportunity for growth companies to really get ahead. Koranteng, who comes from an HR and diversity VP role at Kellogg, is working on embedding her hybrid set of expertise in her new role at Indeed.

Keep Reading Show less
Amber Burton

Amber Burton (@amberbburton) is a reporter at Protocol. Previously, she covered personal finance and diversity in business at The Wall Street Journal. She earned an M.S. in Strategic Communications from Columbia University and B.A. in English and Journalism from Wake Forest University. She lives in North Carolina.


New Jersey could become an ocean energy hub

A first-in-the-nation bill would support wave and tidal energy as a way to meet the Garden State's climate goals.

Technological challenges mean wave and tidal power remain generally more expensive than their other renewable counterparts. But government support could help spur more innovation that brings down cost.

Photo: Jeremy Bishop via Unsplash

Move over, solar and wind. There’s a new kid on the renewable energy block: waves and tides.

Harnessing the ocean’s power is still in its early stages, but the industry is poised for a big legislative boost, with the potential for real investment down the line.

Keep Reading Show less
Lisa Martine Jenkins

Lisa Martine Jenkins is a senior reporter at Protocol covering climate. Lisa previously wrote for Morning Consult, Chemical Watch and the Associated Press. Lisa is currently based in Brooklyn, and is originally from the Bay Area. Find her on Twitter ( @l_m_j_) or reach out via email (ljenkins@protocol.com).

Latest Stories