The DNC has a new secret weapon for finding voters

It’s a functioning address book. Shh. It’s a bigger deal than you think.

"Mi Familia Vota" (My Family Votes) representatives knock on doors as they canvas a neighborhood to register people to vote in Orlando, Florida on October 3, 2016.

“The most impactful interaction a campaign can have is in the real world at a door.”

Photo: Rhona Wise/AFP via Getty Images

Even in a world where your data is being bought and sold and swapped by the second, a political campaign that wants to send a canvasser to knock on your door might still have a pretty hard time figuring out where your door actually is.

Maybe you moved and never changed your registration address. Maybe you’re not registered, so your address isn’t in the voter file at all. Or maybe you live on tribal land or in rural America and don’t have a street address — at least, not one anyone could easily plug into Google Maps.

For politicians and their parties, that leaves a lot of literal ground uncovered, which is why the technology team at the Democratic National Committee has spent the last three years building a tool it says could begin to solve the problem.

The DNC’s new Geographic Address Dataset is a repository of 260 million addresses in the U.S., which the tech team painstakingly compiled by combining nearly a dozen sources — from the voter file to Postal Service data to records from private vendors. That includes 25 million new addresses that were never before included in the DNC’s records. The team affixed nearly all of the 260 million records with census block data, giving Democratic campaigns a new window into demographic information on hundreds of millions of homes, as well as geocodes, which tell campaigns the actual latitude and longitude of where hard-to-find homes exist on a map.

“As much as we all live online these days, people also live in physical locations, and specifically knowing where a voter is located powers door-to-door canvassing, direct mail outreach and just helping campaigns know and truly understand their jurisdictions,” Nell Thomas, chief technology officer of the DNC, told Protocol.

Amassing more accurate addresses may sound like a humble achievement in the world of campaign tech, where every election cycle brings a new round of entrants, claiming that digital ads — no, wait — text messages — no, actually — “relational organizing” is the future of campaigns.

But the Geographic Address Dataset is a more meaningful development than it may initially seem. Historically, campaigns and political parties have focused on collecting as much information about individual people as possible, starting with all the data they can get — phone numbers, Facebook profiles and more — on names contained in the voter file. But the voter file, by definition, only includes the names of people who have registered to vote, leaving out millions of Americans who aren’t registered.

This data set takes a different approach, telling campaigns not everything they need to know about a given person, but everything they need to know about a given address, regardless of whether they know the name of the person living there or not. “When we know that those [households] are in areas where we have a lot of supporters of the Democratic Party, it’s easier for us to target those people and turn out our vote and register Democrats to get out in the next election,” said Jesse Presnell, a DNC engineer who helped build the Geographic Address Dataset.

By focusing on addresses, not individuals, the DNC was able to find 8 million homes for which the party had addresses but no geocodes that would have enabled a campaign to actually go find that home on a map. This work has given the party a 10% increase in canvassable voters on tribal lands and an 11% increase in canvassable voters in rural America.

“We’re clearly missing a huge chunk of the electorate,” said Raffi Krikorian, CTO of Emerson Collective, who preceded Thomas as CTO of the DNC. “If you can flip that equation on its head and no longer look through the lens of the voter file, but look through the lens of the electorate, that is a big deal in my mind.”

The question is, of course, what took so long? Having an accurate address book is hardly a novel concept, and the data sets the DNC is working with have been around for a while. The truth, Thomas said, is that combining and vetting those data sets is a messy, time-consuming job that’s hard to pull off in a cyclical business.

But that’s starting to change. The DNC’s tech team is now bigger than ever, with more than 65 people on staff, which has allowed the party to take on an ambitious project like this one even between major elections. “We're not getting smaller, and that's really unusual,” she said.

The DNC also overhauled its data infrastructure three years ago, retiring an outdated system that was prone to crashing any time it got overloaded and that Democratic operatives referred to as a “shit show.” In 2019, the party replaced that system with a new, more sophisticated data warehouse called Phoenix, which is based on Google Big Query. “Working with billions of rows of data is not something that we would have been able to do until we got on Big Query,” Presnell said.

The new data set, which is continuously updating, will initially be available to campaigns and party officials through the Phoenix data warehouse. The tech team plans to begin training state party officials on how to use the data set this week and hopes to funnel the data into other voter contact tools, like the party’s main voter file, which is called the VAN, soon. “This will absolutely be extremely useful and effective and ready for election season,” Thomas said.

But while Thomas’ team has worked to ensure the technology is ready for the midterms, Krikorian warns that it will take more than a single election cycle for the party to really reap the benefits of this kind of undertaking. One of the biggest challenges in building technology for campaigns is that both focus and funding tend to wane after an election is over. For evidence of that, look no further than Alloy, a $35 million attempt to reinvent the voter file for Democrats, which launched in 2018 and shut down shortly after the 2020 election, after it failed to gain traction with the party.

And with Democrats expected to get clobbered this November, it may be hard to see whether this data set actually did anything to improve the party’s chances.

“This is a hard project. This is not something you pull off in two years. This is not something you pull off in four years. This is a shift in the mentality about how to think about voter data in a political party,” Krikorian said. “She’s going to need to make this successful in two years and four years, but this is going to take a decade to get right.”

Addresses also aren’t the DNC’s only focus this year. The tech team is spending another $5 million to buy other data sets, including cell phone numbers, to help with voter contact.

And yet, Thomas believes that the new tool could help give Democrats an edge they can’t get by flooding people’s phones. “The most impactful interaction a campaign can have,” she said, “is in the real world at a door.”


Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep ReadingShow less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep ReadingShow less
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep ReadingShow less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep ReadingShow less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.


Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep ReadingShow less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories