The FTC’s 'profoundly vague' plan to force companies to destroy algorithms could get very messy

Companies take algorithms out of production all the time. But wiping an AI model and the data that built it off the face of the earth could be a lot more challenging.

Government trigger to blow up a safe of algorithms with dynamite

Algorithmic systems using AI and machine or deep learning can involve large models or families of models involving extremely complex logic expressed in code.

Illustration: CreepyCube/iStock/Getty Images Plus; Protocol

“The premise is simple,” FTC Commissioner Rebecca Slaughter and FTC lawyers wrote last year.

They were talking about a little-used enforcement tool called algorithmic disgorgement, a penalty the agency can wield against companies that used deceptive data practices to build algorithmic systems like AI and machine-learning models. The punishment: They have to destroy ill-gotten data and the models built with it. But while privacy advocates and critics of excessive data collection are praising the concept in theory, in practice it could be anything but simple to implement.

In fact, separating tainted data and algorithmic systems from the unaffected parts of a company’s technology products and intellectual property could be about as easy as teasing out a child’s tangled hair.

“Once you delete the algorithm, you delete the learning. But if it’s entangled with other data, it can get complex pretty fast,” said Rana el Kaliouby, a machine-learning scientist and deputy CEO of driver-monitoring AI firm Smart Eye.

The FTC’s March 3 settlement order against WW, the company formerly known as Weight Watchers, marked the most recent time the agency has demanded a company destroy algorithmic systems. As part of its settlement, the company also must delete data gathered deceptively, provide a written statement confirming deletion sworn under penalty of perjury and keep records for 10 years demonstrating compliance.

But the order provides little detail about how the company must comply or how the FTC will know for sure it did.

The order is “profoundly vague,” said Pam Dixon, executive director of World Privacy Forum. “We’re not usually talking about a single algorithm. I would like to have seen more in their materials about what it is that is being disgorged specifically.”

For example, she said it is unclear whether WW used the data the FTC wants it to delete for marketing, for machine-learning models to predict or score kids’ health status or for other purposes.

How to kill an algorithm

Companies decommission algorithmic models by taking them out of production all the time. In some cases, an algorithm is just a simple piece of code: something that tells a software application how to perform a set of actions.

If WW used the data it was ordered to delete to build just one machine-learning model used in one particular feature of its app, for example, deleting the code for that feature could be a relatively straightforward process, el Kaliouby said.

But algorithmic systems using AI and machine or deep learning can involve large models or families of models involving extremely complex logic expressed in code. Algorithmic systems used in social media platforms, for example, might incorporate several different intersecting models and data sets all working together.

In any case, the first step does involve taking the model out of operation. This ensures that it will no longer process data or ingest new data.

But it’s not so easy to decouple data from algorithmic systems, in part because data used to train and feed them hardly ever sits in one place. Data obtained through deceptive means may end up in a data set that is then sliced and diced to form multiple data set “splits,” each used for separate purposes throughout the machine-learning model development process for model training, testing and validation, said Anupam Datta, co-founder and chief scientist at TruEra, which provides a platform for explaining and monitoring AI models.

And once a model has been deployed, it might blend ill-gotten data along with additional information from other sources, such as data ingested through APIs or real-time data streams.

Explosion of algorithm safe The first step in killing an algorithm involves taking the model out of operationIllustration: CreepyCube/iStock/Getty Images Plus; Protocol

Nowadays, data is often managed in the cloud. Cloud providers like AWS, Azure or Google Cloud offer standardized ways to delete data. A data scientist could use a tool from a cloud platform to mark which data needs to be deleted at varying levels of granularity, Datta said.

When the data storage area for that particular data is marked for removal, the space is freed up, allowing the system to write over or replace that doomed data with new information. However, in that case, the data that was intended to be deleted could still be recovered, Datta said.

Cryptographic erasure could be used to delete data more permanently, he said. The process encrypts the data record with an encoded key that is itself deleted, like locking the data in a box and throwing away the key.

The data replicant problem

In addition to data blending, data copying adds more layers of complexity to the removal process. Data often is replicated and distributed so it can be accessed or used by multiple people or for multiple purposes.

Krishnaram Kenthapadi, chief scientist at machine-learning model monitoring company Fiddler, called this problem — deleting algorithmic models built with ill-gotten information — one of data provenance. It requires an understanding of how data gleaned through deceptive means has moved or been processed within a complex data ecosystem from the time the data was originally collected.

“You want to track all the downstream applications that touched or may have used this data,” he said.

Inspired in part by Europe’s General Data Protection Regulation, which gives people the right to demand that companies delete their personal data, today’s cloud platforms, data management software and technologies used for building and operationalizing AI and machine-learning models — sold by companies such as AWS, C3.ai, Dataiku, Databricks, Dremio, Google Cloud, Informatica, Matillion and others — provide tools that help companies keep track of data lineage to know where data came from, when it was copied for backup or multiple uses and where those copies moved over time.

Without those sorts of tools in place, though, it could be difficult for a company to know for sure whether every copy has actually been deleted. “You might still have some copies left over that are unaccounted for,” said Datta.

Many companies do not have processes set up to automatically attach lineage information to data they collect and use in building algorithmic systems, said Kevin Campbell, CEO of Syniti, a company that provides data technologies and services for things like data migration and data quality.

“If you don’t have a centralized way of capturing that information, you have to have a whole bunch of people chase it down,” said Campbell. “A whole lot of people are going to write a lot of queries.”

As data use and AI become increasingly complex, monitoring for compliance could be difficult for regulators, said el Kaliouby. “It’s not impossible,” she said, but “it’s just hard to enforce some of these things, because you have to be a domain expert.”


This carbon capture startup wants to clean up the worst polluters

The founder and CEO of point-source carbon capture company Carbon Clean discusses what the startup has learned, the future of carbon capture technology, as well as the role of companies like his in battling the climate crisis.

Carbon Clean CEO Aniruddha Sharma told Protocol that fossil fuels are necessary, at least in the near term, to lift the living standards of those who don’t have access to cars and electricity.

Photo: Carbon Clean

Carbon capture and storage has taken on increasing importance as companies with stubborn emissions look for new ways to meet their net zero goals. For hard-to-abate industries like cement and steel production, it’s one of the few options that exist to help them get there.

Yet it’s proven incredibly challenging to scale the technology, which captures carbon pollution at the source. U.K.-based company Carbon Clean is leading the charge to bring down costs. This year, it raised a $150 million series C round, which the startup said is the largest-ever funding round for a point-source carbon capture company.

Keep Reading Show less
Michelle Ma

Michelle Ma (@himichellema) is a reporter at Protocol covering climate. Previously, she was a news editor of live journalism and special coverage for The Wall Street Journal. Prior to that, she worked as a staff writer at Wirecutter. She can be reached at mma@protocol.com.

Sponsored Content

Great products are built on strong patents

Experts say robust intellectual property protection is essential to ensure the long-term R&D required to innovate and maintain America's technology leadership.

Every great tech product that you rely on each day, from the smartphone in your pocket to your music streaming service and navigational system in the car, shares one important thing: part of its innovative design is protected by intellectual property (IP) laws.

From 5G to artificial intelligence, IP protection offers a powerful incentive for researchers to create ground-breaking products, and governmental leaders say its protection is an essential part of maintaining US technology leadership. To quote Secretary of Commerce Gina Raimondo: "intellectual property protection is vital for American innovation and entrepreneurship.”

Keep Reading Show less
James Daly
James Daly has a deep knowledge of creating brand voice identity, including understanding various audiences and targeting messaging accordingly. He enjoys commissioning, editing, writing, and business development, particularly in launching new ventures and building passionate audiences. Daly has led teams large and small to multiple awards and quantifiable success through a strategy built on teamwork, passion, fact-checking, intelligence, analytics, and audience growth while meeting budget goals and production deadlines in fast-paced environments. Daly is the Editorial Director of 2030 Media and a contributor at Wired.

Why companies cut staff after raising millions

Are tech firms blowing millions in funding just weeks after getting it? Experts say it's more complicated than that.

Bolt, Trade Republic, HomeLight, and Stord all drew attention from funding announcements that happened just weeks or days before layoffs.

Photo: Pulp Photography/Getty Images

Fintech startup Bolt was one of the first tech companies to slash jobs, cutting 250 employees, or a third of its staff, in May. For some workers, the pain of layoffs was a shock not only because they were the first, but also because the cuts came just four months after Bolt had announced a $355 million series E funding round and achieved a peak valuation of $11 billion.

“Bolt employees were blind sided because the CEO was saying just weeks ago how everything is fine,” an anonymous user wrote on the message board Blind. “It has been an extremely rough day for 1/3 of Bolt employees,” another user posted. “Sadly, I was one of them who was let go after getting a pay-raise just a couple of weeks ago.”

Keep Reading Show less
Nat Rubio-Licht

Nat Rubio-Licht is a Los Angeles-based news writer at Protocol. They graduated from Syracuse University with a degree in newspaper and online journalism in May 2020. Prior to joining the team, they worked at the Los Angeles Business Journal as a technology and aerospace reporter.


The fight to define the carbon offset market's future

The world’s largest carbon offset issuer is fighting a voluntary effort to standardize the industry. And the fate of the climate could hang in the balance.

It has become increasingly clear that scaling the credit market will first require clear standards and transparency.

Kevin Frayer/Getty Images

There’s a major fight brewing over what kind of standards will govern the carbon offset market.

A group of independent experts looking to clean up the market’s checkered record and the biggest carbon credit issuer on the voluntary market is trying to influence efforts to define what counts as a quality credit. The outcome could make or break an industry increasingly central to tech companies meeting their net zero goals.

Keep Reading Show less
Lisa Martine Jenkins

Lisa Martine Jenkins is a senior reporter at Protocol covering climate. Lisa previously wrote for Morning Consult, Chemical Watch and the Associated Press. Lisa is currently based in Brooklyn, and is originally from the Bay Area. Find her on Twitter ( @l_m_j_) or reach out via email (ljenkins@protocol.com).


White House AI Bill of Rights lacks specific guidance for AI rules

The document unveiled today by the White House Office of Science and Technology Policy is long on tech guidance, but short on restrictions for AI.

While the document provides extensive suggestions for how to incorporate AI rights in technical design, it does not include any recommendations for restrictions on the use of controversial forms of AI.

Photo: Ana Lanza/Unsplash

It was a year in the making, but people eagerly anticipating the White House Bill of Rights for AI will have to continue waiting for concrete recommendations for future AI policy or restrictions.

Instead, the document unveiled today by the White House Office of Science and Technology Policy is legally non-binding and intended to be used as a handbook and a “guide for society” that could someday inform government AI legislation or regulations.

Blueprint for an AI Bill of Rights features a list of five guidelines for protecting people in relation to AI use:

Keep Reading Show less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories