A lake.
Photo: Aaron Burden/Unsplash

For Capital One, 'the lake is everything'

Protocol Enterprise

Hello and welcome to Protocol Enterprise! Today: how cloud-based data lakes are transforming the way Capital One thinks about AI, how war in Ukraine could threaten the supply of critical chipmaking materials and the week ahead for enterprise tech.

Spin up

It’s a tale as old as time, or at least as old as the mainframe. Ancient but business-critical applications are keeping 79% of companies surveyed by 2nd Watch from moving forward into the modern era, yet 91% of them acknowledged they’ll need to modernize those apps to remain competitive.

The great (data) lakes

Skittish, reluctant, hesitant — downright scared. Financial services companies have been all that and more when it comes to migrating their heavily regulated, data-heavy businesses from legacy systems to the cloud.

But while some banks and credit-card providers are still just dipping their toes, Capital One has been “all in” on the public cloud since 2015, according to the company’s senior vice president of CIO Enterprise Data and Machine Learning at Capital One, Mike Eason. These days Eason and his team of 1,800 engineers and technology staff are busy developing a self-service data pipeline and platform with tools for in-house staff to access data to build and train machine-learning models.

Protocol caught up with Eason this week to talk about why the data lake is making a difference, why the company wants to automate how it explains its AI models and its efforts to expand Capital One’s company-wide team of 11,000 engineers from the inside.

Capital One has a data lake. Why is there a need for that? What’s unique about what you can do in a cloud data lake environment?

One is just from a macro standpoint, the cost of data and compute is just dramatically reduced. When we were on prem, we were using the Teradatas of the world and others, and the cost of compute and space is dramatically different than it is today.

We're a big credit-card provider, and during the holidays, we can spin up more compute and more space and everything to handle the different loads as everyone's doing their holiday shopping, and so that aspect of the cloud has just been phenomenally important to us, and just a game changer.

From a lake standpoint, the amount of data that we can capture and utilize in our models is just tremendously different — like exponentially different. The lake provides that one copy of everything for us, and is the one place where all the data will be.

And so we use a combination of the lake and Snowflake for some more of the structured, traditional warehouse data.

What types of data points or data sources would be flowing into the lake versus a more structured environment?

The lake is everything. It’s the receipt and the copy of all data from the company. So we've built a data pipeline to publish our data. And as an end user, you can then determine, I want to publish the data, so I’m gonna go to the lake, but I want to publish these attributes or this data to Snowflake.

Or – and this is something we just recently built – I might want to put data into a low-latency operational type of database that our operational systems can hit, or our models can hit.

So it's one pipeline that gets to publish to many different locations. It’s a simple, more self-service kind of platform for end users of publishing data. The lake is the copy of everything. And then there might be a subset of needs of things in Snowflake for reporting, doing some general analysis, the munging of data together.

And then there’s the low-latency environment for more back-end, really quick models, making a fraud decision in the moment, when you're using the data to determine if Kate’s transaction is going to go through.

What’s an example of a low-latency use for a data lake?

Fraud is a great case of that. You're swiping the card, we have less than 100 milliseconds of determining if this is a fraudulent transaction or not. And you want as much data and as [many] data points to be able to make that decision.

Read the rest of the interview with Mike Eason here.

— Kate Kaye (email | twitter)

A MESSAGE FROM ENVOY

The concept of flex work isn’t new, but its widespread adoption is. Flex work helps all of us find some semblance of control in the middle of an uncontrollable pandemic. Giving options makes people happier and less stressed. This leads to a greater desire to participate, which helps us build our communities and culture.

Learn more

Chipmakers brace for Russia-Ukraine conflict

The White House has been urging the chip industry to develop workarounds to potential material disruptions that could result from a political tension between the U.S. and Russia, Reuters reported Friday.

Russia produces neon and palladium, which are important elements used in chip manufacturing. Factories use neon from Ukraine for laser gases in lithography, and companies use palladium from Russia for chip packaging as well as in some sensors and memory, according to the materials consulting company Techcet.

As reports emerged Friday that Russia could invade Ukraine as early as next week, staffers for the White House National Security Council’s Peter Harrell have been in contact with chip businesses and have urged them to find alternative material sources, according to Reuters. The chip equipment industry association has also recently asked members about their exposure to materials sourced from the region, Reuters reported, which could yet again extend the prolonged chip shortage.

— Max A. Cherney (email | twitter)

Coming next week

Intel will hold its annual investor day on Thursday led by CEO Pat Gelsinger and other top executives, and Max will be there with coverage from the event.

Fresh off the collapse of the Arm deal, Nvidia will report fourth-quarter earnings on Wednesday.

Dropbox is scheduled to report fourth-quarter earnings on Thursday.

Amplitude will release its fourth-quarter earnings on Wednesday.

Around the enterprise

Cisco recently made a $20 billion takeover offer for Splunk, according to the Wall Street Journal, but the companies “aren’t currently in active talks.”

Zendesk announced that it had rejected an “unsolicited” offer from “a consortium of private-equity firms” that would have valued the company at around $17 billion.

Time to update the “Killed by Google” database: The company said it will close Google Currents, a business-oriented social networking product you’ll be forgiven for forgetting existed.

Microsoft will build a “sovereign cloud” for Singapore, an arrangement that generally implies all data processing in that cloud will remain within the country.

Thanks for reading — see you Monday!

Recent Issues