Privacy by Design laws will kill your data pipelines

The legislation could make old data pipelines more trouble than they’re worth.

An illustration of the Earth with data coming out

Data pipelines have become so unwieldy that companies might not even know if they are complying with regulations.

Image: Andriy Onufriyenko/Getty Images

A car is totaled when the cost to repair it exceeds its total value. By that logic, Privacy by Design legislation could soon be totaling data pipelines at some of the most powerful tech companies.

Those pipelines were developed well before the advent of more robust user privacy laws, such as the European Union’s GDPR (2018) and the California Consumer Privacy Act (2020). Their foundational architectures were therefore designed without certain privacy-preserving principals in mind, including k-anonymity and differential privacy.

But the problem extends way beyond trying to layer privacy mechanisms on top of existing algorithms. Data pipelines have become so complex and unwieldy that companies might not even know whether they are complying with regulations. As Meta engineers put it in a leaked internal document: “We do not have an adequate level of control and explainability over how our systems use data, and thus we can’t confidently make controlled policy changes or external commitments.”

(When we asked Meta for comment, a spokesperson referred us to the company’s original response to Motherboard about the leaked document, which said, in part: “The document was never intended to capture all of the processes we have in place to comply with privacy regulations around the world or to fully represent how our data practices and controls work.”)

As governments increasingly embrace Privacy by Design (PbD) legislation, tech companies face a choice: either start from scratch or try to fix data pipelines that are old, extraordinarily complex and already non-compliant. Some computer science researchers say a fresh start is the only way to go. But for tech companies, starting over would require engineers to roll out critical data infrastructure changes without disrupting day-to-day operations — a task that’s easier said than done.

‘Open borders’ won’t cut it

Motherboard published the leaked internal document, written by Meta engineers in 2021, at the end of April. In it, an engineering team recommended data architecture changes that would help Meta comply with a wave of governments embracing the “consent regime,” one of the core principles of PbD. India, Thailand, South Korea, South Africa and Egypt were all preparing “impactful regulations” in this realm, and the paper also anticipated U.S. federal privacy regulation in 2022 and beyond. Such legislation would generally require Meta to obtain user consent before collecting data for advertisements.

The Meta engineers identified “the heart of our challenge” as a lack of “closed form systems.” Closed systems, they said, would let Meta enumerate and control all the incoming data flows. The engineers placed that in contrast with the “open borders” system that had been baked into company culture for over a decade.

Meta’s systems had grown increasingly complex and untraceable, the engineers said, citing the example of a single feature (“user_home_city_moved”) drawing from around six thousand data tables.

“These are massive pipelines with massive amounts of data feeding into many different kinds of algorithms,” Nikola Banovic, an assistant professor of computer science and engineering at the University of Michigan, told Protocol. “Because this was never a consideration to begin with, now it’s increasingly difficult to untangle things.”

The leaked document showed the frustration of internal teams tasked with overhauling systems designed in an era when everything was fair game, Banovic said. He noted that advocacy groups are pressuring companies to now design systems around end users.

“It’s not going to be easy,” Banovic said of the shift. He added that, while enhancing user privacy would be possible from a technology perspective, online behavioral advertising is fundamentally in conflict with that objective.

The challenges of tracing data flows at that scale aren’t unique to Meta, according to Hana Habib, a postdoctoral researcher at Carnegie Mellon University. “I’m sure all the major tech companies like Google and Twitter — the big tech giants — are facing this issue just because [of] the scale of their operations,” she told Protocol. Habib noted that most of the largest tech companies have faced GDPR fines.

When to say goodbye

Researchers already have a firm grasp on ways to make existing algorithms more privacy-preserving. K-anonymization, for example, is a user privacy technique that ensures data is sufficiently aggregated such that no individual can be identified by combined factors such as hometown and employment. Differential privacy, a standard that has been studied for over a decade, guarantees that someone observing the outputs of an algorithm cannot know whether it included data from a particular individual.

For many years now, Big Tech engineers have studied, applied and occasionally advanced these privacy standards. Google, for instance, achieved differential privacy anonymization in Chrome around 2014 and has since worked to expand it to Google Maps and Assistant. In 2018, Meta assured differential privacy compliance when it gave academics access to user data for assessing the impact of social media on elections. Apple published an in-depth research paper in 2017 about its application of differential privacy for features such as emoji recommendations and lookup hints.

If you have 10 things to do and if you have resources to spend on three, which ones would you pick?

But several sources said the problem is scale and sprawl, not just technique.

“I don’t know how they can be expected to comply with privacy regulation, which stipulates that they provide notice to consumers about these aspects of their data, when they don’t really know themselves,” said Habib.

Companies often don’t have visibility into where their data is being used and stored, according to Balaji Ganesan, the CEO and co-founder of the data governance startup Privacera. Ganesan told Protocol that data scientists often copy data without communicating that to the broader organization. So when a customer then requests their data be removed — as they are entitled under a PbD framework — a large tech company might not even know how to do so. “The challenge is really understanding where that subject data is,” Ganesan said.

To comply with user privacy regulations, companies need to build data pipelines from the ground up, said Jane Im, a Ph.D. candidate in computer science and engineering at the University of Michigan. “If they really want to comply, they should limit the amount of data they’re collecting,” Im told Protocol.

Facebook and others are habituated to use "massive amounts of data" for their business, Im added. "Would it be feasible for Facebook to retrain models?" she asked, wondering aloud if users would consent to "tracking so much of their users’ behavior, including off-site" if given the opportunity.

“Since these privacy regulations have come out after these systems are built, it's hard to retrofit existing systems to match these laws, which are pretty comprehensive and seem in line with what people actually want related to digital privacy,” said Habib.

Privacy at what cost?

What’s good for privacy often isn’t good for business, but it doesn't need to be that way. As with so much in this field, the outcome depends on implementation.

“We shouldn’t be surprised that accuracy also depends on the context,” Ben Fish, an assistant professor in computer science and engineering at the University of Michigan, told Protocol. “But it is far from guaranteed that privacy techniques will make a system worse — they can make a system better.”

In the leaked document, Meta engineers said addressing the privacy challenges would “require additional multi-year investment in Ads and our infrastructure teams to gain control over how our systems ingest, process and egest data.” That effort would require roughly 600 years’ worth of engineering time assigned to related projects, the authors estimated.

The Meta document shows just how resource-intensive it can be to rework systems to be more privacy compliant. Assigning those resources is obviously costly, so the challenge for regulators is making the penalties for violators costly enough to push privacy up on the priority list.

Executives must choose between allocating resources to privacy initiatives and other business priorities, according to Ganesan. “It always boils down to, at the top level, if you have 10 things to do and if you have resources to spend on three, which ones would you pick?” he said. Ganesan said the willingness to prioritize those investments is where things fall short more than anything else.

Further complicating the investment calculus, several sources said they see the shift from open to closed systems as only a first step.

“Even questions about where should the controls for these kinds of actions be placed so they're findable, they're discoverable — so that people know that they can actually do this — is an open research question, let alone what would it take to create these massive, massive pipelines that control for user data,” said Banovic.

Then there’s the consumer side: “We need more education for users that could potentially lead to more collective action,” said Im. Most social media users don’t grasp the extent to which online behavioral advertising business models collect and monetize their data, according to several research papers she referenced. “This kind of goes back to media literacy,” Im said.


Affirm CEO: 'Buy now, pay later' becomes more attractive in a slump

With consumers grappling with rising rates and prices, the question of whether they’ll still buy now and pay later is open. Max Levchin thinks Affirm knows the answer.

Affirm CEO Max Levchin spoke with Protocol about "buy now, pay later."

Photo: John Lamparski/Getty Images

Shortly after Affirm went public last year, CEO Max Levchin told Protocol that he saw “an ocean of opportunities” for the “buy now, pay later” pioneer. Wall Street agreed.

Affirm’s stock soared in its trading debut as the company blazed a trail for a fast-growing alternative to the credit cards that Levchin says consumers are increasingly rejecting.

Keep Reading Show less
Benjamin Pimentel

Benjamin Pimentel ( @benpimentel) covers crypto and fintech from San Francisco. He has reported on many of the biggest tech stories over the past 20 years for the San Francisco Chronicle, Dow Jones MarketWatch and Business Insider, from the dot-com crash, the rise of cloud computing, social networking and AI to the impact of the Great Recession and the COVID crisis on Silicon Valley and beyond. He can be reached at bpimentel@protocol.com or via Google Voice at (925) 307-9342.

Businesses are evolving, with current events and competition serving as the catalysts for technology adoption. Events from the pandemic to the ongoing war in Ukraine have exposed the fragility of global supply chains. The topic of sustainability is now on every board room agenda. Industries from manufacturing to retail and everything in between are exploring the latest innovations like process automation, machine learning and AI to identify potential safeguards against future disruption. But according to a recent survey from Boston Consulting Group, while 80% of companies are adopting digital solutions to navigate existing business challenges or opportunities like the ones mentioned, only about 30% successfully digitally transform their business.

For the last 50 years, SAP has worked closely with our customers to solve some of the world’s most intricate problems. We have also seen, and have been a part of, rapid accelerations in technology in response. Across industries, certain paths have emerged to help businesses manage the unexpected challenges over the last few years.

Keep Reading Show less
DJ Paoni

DJ Paoni is the President of SAP North America and is responsible for the strategy, day-to-day operations, and overall customer success in the United States and Canada. Dedicated to helping customers become best-run businesses, DJ has established himself as a trusted advisor who places a high priority on their success. He works with many of SAP North America's 155,000 customers and helps them adopt business and technology best practices across 25 different industries.


The post-layoff playbook: How to avoid 'survivor's guilt'

Taking care of your laid-off employees is important. But how can you restore trust with the employees who make it through?

Employees who survive layoffs are charged with holding the company together. Whether or not managers listen to their concerns can make or break a company’s culture.

Photo: Justin Pumfrey/The Image Bank/Getty Images

Jennifer Burke was on her way to Hawaii for her daughter’s wedding when Zillow followed through on its long-anticipated layoff. She asked her manager to break the news to her by message in the car. You’re one of the safe ones, her manager responded.

“I felt relieved, of course,” Burke said. “I felt apprehensive. I felt sympathy for my co-workers that I knew were going to be laid off.”

Keep Reading Show less
Lizzy Lawrence

Lizzy Lawrence ( @LizzyLaw_) is a reporter at Protocol, covering tools and productivity in the workplace. She's a recent graduate of the University of Michigan, where she studied sociology and international studies. She served as editor in chief of The Michigan Daily, her school's independent newspaper. She's based in D.C., and can be reached at llawrence@protocol.com.


Why chip companies need the college students dazzled by software jobs

New chip fabricating plants will need tens of thousands of skilled workers who don’t currently exist. Training them means persuading students to look away from jobs at big tech companies.

Intel employees in clean room "bunny suits" work at Intel's D1X factory in Hillsboro, Oregon.

Photo: Intel Corporation

Every morning, Isaiah Morris drives his white Nissan Altima eight miles down Arizona state Route 101 to a sprawling, low-level office park in South Tempe. Inside one of the unassuming buildings adjacent to GoDaddy’s headquarters and a couple of Amazon offices, the Arizona State University student dons a lab coat, safety shoes and prescription goggles as he helps engineer chemicals for a chip manufacturing process called planarization.

Morris is an unusual 21-year-old. When they graduate college, many of his tech-minded peers will opt to work for the likes of Apple, Google and other household names that have enjoyed meteoric growth over the last decade. Jobs at those tech companies symbolize prestige for graduates and their parents in a way that careers with chipmakers like Intel do not.

Keep Reading Show less
Anna Kramer

Anna Kramer is a reporter at Protocol (Twitter: @ anna_c_kramer, email: akramer@protocol.com), where she writes about labor and workplace issues. Prior to joining the team, she covered tech and small business for the San Francisco Chronicle and privacy for Bloomberg Law. She is a recent graduate of Brown University, where she studied International Relations and Arabic and wrote her senior thesis about surveillance tools and technological development in the Middle East.


A new UK visa could steal your top tech talent

Without meaningful immigration reform, U.S.-trained foreign graduates could head across the pond.

The U.S. immigration system turns away hundreds of thousands of highly skilled tech workers every year.

Photo: Ben Fathers/AFP via Getty Images

Almost as soon as he took office, President Biden began the work of undoing a lot of the damage the Trump administration did to the U.S. H-1B visa program. He allowed a Trump-era ban on entry by H-1B holders to expire and withdrew a Trump proposal to prohibit H-1B visa holders’ spouses from working in the U.S. More recently, his administration has expanded the number of degrees considered eligible for special STEM OPT visas.

But the U.S. immigration system still turns away hundreds of thousands of highly skilled — and in many cases U.S.-educated — tech workers every year. Now the U.K. is trying to capitalize on the United States’ failure to reform its policy regarding high-skilled immigrants with a new visa that could poach American-trained tech talent across the pond. And there’s good reason to believe it could work.

Keep Reading Show less
Kwasi Gyamfi Asiedu

Kwasi (kway-see) is a fellow at Protocol with an interest in tech policy and climate. Previously, he covered global religion news at the Associated Press in New York. Before that, he was a freelance journalist based out of Accra, Ghana, covering social justice, health, and environment stories. His reporting has been published in The New York Times, Quartz, CNN, The Guardian, and Public Radio International. He can be reached at kasiedu@protocol.com.

Latest Stories