The FTC’s 'profoundly vague' plan to force companies to destroy algorithms could get very messy

Companies take algorithms out of production all the time. But wiping an AI model and the data that built it off the face of the earth could be a lot more challenging.

Government trigger to blow up a safe of algorithms with dynamite

Algorithmic systems using AI and machine or deep learning can involve large models or families of models involving extremely complex logic expressed in code.

Illustration: CreepyCube/iStock/Getty Images Plus; Protocol

“The premise is simple,” FTC Commissioner Rebecca Slaughter and FTC lawyers wrote last year.

They were talking about a little-used enforcement tool called algorithmic disgorgement, a penalty the agency can wield against companies that used deceptive data practices to build algorithmic systems like AI and machine-learning models. The punishment: They have to destroy ill-gotten data and the models built with it. But while privacy advocates and critics of excessive data collection are praising the concept in theory, in practice it could be anything but simple to implement.

In fact, separating tainted data and algorithmic systems from the unaffected parts of a company’s technology products and intellectual property could be about as easy as teasing out a child’s tangled hair.

“Once you delete the algorithm, you delete the learning. But if it’s entangled with other data, it can get complex pretty fast,” said Rana el Kaliouby, a machine-learning scientist and deputy CEO of driver-monitoring AI firm Smart Eye.

The FTC’s March 3 settlement order against WW, the company formerly known as Weight Watchers, marked the most recent time the agency has demanded a company destroy algorithmic systems. As part of its settlement, the company also must delete data gathered deceptively, provide a written statement confirming deletion sworn under penalty of perjury and keep records for 10 years demonstrating compliance.

But the order provides little detail about how the company must comply or how the FTC will know for sure it did.

The order is “profoundly vague,” said Pam Dixon, executive director of World Privacy Forum. “We’re not usually talking about a single algorithm. I would like to have seen more in their materials about what it is that is being disgorged specifically.”

For example, she said it is unclear whether WW used the data the FTC wants it to delete for marketing, for machine-learning models to predict or score kids’ health status or for other purposes.

How to kill an algorithm

Companies decommission algorithmic models by taking them out of production all the time. In some cases, an algorithm is just a simple piece of code: something that tells a software application how to perform a set of actions.

If WW used the data it was ordered to delete to build just one machine-learning model used in one particular feature of its app, for example, deleting the code for that feature could be a relatively straightforward process, el Kaliouby said.

But algorithmic systems using AI and machine or deep learning can involve large models or families of models involving extremely complex logic expressed in code. Algorithmic systems used in social media platforms, for example, might incorporate several different intersecting models and data sets all working together.

In any case, the first step does involve taking the model out of operation. This ensures that it will no longer process data or ingest new data.

But it’s not so easy to decouple data from algorithmic systems, in part because data used to train and feed them hardly ever sits in one place. Data obtained through deceptive means may end up in a data set that is then sliced and diced to form multiple data set “splits,” each used for separate purposes throughout the machine-learning model development process for model training, testing and validation, said Anupam Datta, co-founder and chief scientist at TruEra, which provides a platform for explaining and monitoring AI models.

And once a model has been deployed, it might blend ill-gotten data along with additional information from other sources, such as data ingested through APIs or real-time data streams.

Explosion of algorithm safeThe first step in killing an algorithm involves taking the model out of operationIllustration: CreepyCube/iStock/Getty Images Plus; Protocol

Nowadays, data is often managed in the cloud. Cloud providers like AWS, Azure or Google Cloud offer standardized ways to delete data. A data scientist could use a tool from a cloud platform to mark which data needs to be deleted at varying levels of granularity, Datta said.

When the data storage area for that particular data is marked for removal, the space is freed up, allowing the system to write over or replace that doomed data with new information. However, in that case, the data that was intended to be deleted could still be recovered, Datta said.

Cryptographic erasure could be used to delete data more permanently, he said. The process encrypts the data record with an encoded key that is itself deleted, like locking the data in a box and throwing away the key.

The data replicant problem

In addition to data blending, data copying adds more layers of complexity to the removal process. Data often is replicated and distributed so it can be accessed or used by multiple people or for multiple purposes.

Krishnaram Kenthapadi, chief scientist at machine-learning model monitoring company Fiddler, called this problem — deleting algorithmic models built with ill-gotten information — one of data provenance. It requires an understanding of how data gleaned through deceptive means has moved or been processed within a complex data ecosystem from the time the data was originally collected.

“You want to track all the downstream applications that touched or may have used this data,” he said.

Inspired in part by Europe’s General Data Protection Regulation, which gives people the right to demand that companies delete their personal data, today’s cloud platforms, data management software and technologies used for building and operationalizing AI and machine-learning models — sold by companies such as AWS, C3.ai, Dataiku, Databricks, Dremio, Google Cloud, Informatica, Matillion and others — provide tools that help companies keep track of data lineage to know where data came from, when it was copied for backup or multiple uses and where those copies moved over time.

Without those sorts of tools in place, though, it could be difficult for a company to know for sure whether every copy has actually been deleted. “You might still have some copies left over that are unaccounted for,” said Datta.

Many companies do not have processes set up to automatically attach lineage information to data they collect and use in building algorithmic systems, said Kevin Campbell, CEO of Syniti, a company that provides data technologies and services for things like data migration and data quality.

“If you don’t have a centralized way of capturing that information, you have to have a whole bunch of people chase it down,” said Campbell. “A whole lot of people are going to write a lot of queries.”

As data use and AI become increasingly complex, monitoring for compliance could be difficult for regulators, said el Kaliouby. “It’s not impossible,” she said, but “it’s just hard to enforce some of these things, because you have to be a domain expert.”


Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep ReadingShow less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep ReadingShow less
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep ReadingShow less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep ReadingShow less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.


Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep ReadingShow less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories