Databases will be a $100 billion market. Neo4j’s CEO just needs a sliver.

The graph database company raised $325 million in June, which Neo4j said is the largest ever for a database startup.

​Neo4j CEO Emil Eifrem

Neo4j CEO Emil Eifrem is eyeing the database industry split.

Photo: Neo4j

In Neo4j CEO Emil Eifrem's mind, the database industry is split into two camps: systems that deal with historical data and those that support real-time processing.

Neo4j would be in the latter category. It's a subsector that Eifrem said is dominated by six players: Microsoft, Google Cloud, AWS, Redis Labs, MongoDB and, of course, Neo4j. But Eifrem is betting that owning just a sliver of the booming market will be lucrative.

"The database market is the single biggest one in all of enterprise software. It's about $50 billion today. But it's going to be $100 billion" in just a few years, he told Protocol. "If you are the leader of one of these big, new segments, those are massive categories. They're way bigger than what the relational database was, for example, in the late '80s, early '90s."

Other leaders in the real-time processing sector are likely to disagree with Eifrem's viewpoint. But Neo4j is a bit different in that it's peddling a newer type of architecture called graph databases, systems that are able to take the increasing amount of information that companies are collecting and draw immediate connections between them, something that Eifrem argued is impossible with the tabular databases of the past.

The company is not alone among graph database companies; AWS, for example, launched its own graph database called Neptune in 2018. But Neo4j is well on the way to establishing itself as a leader. The company raised $325 million in June — the most ever for a database startup, according to Neo4j — and is considering an IPO in the next few years, according to a marketing slide viewed by Protocol.

In an interview with Protocol, Eifrem talked about why he thinks the split within the market will grow more pronounced and why enterprises are increasingly picking databases tailored to specific end applications.

This interview has been edited and condensed for clarity.

What are graph databases and why did you have to create a new category within this industry? What is it about the historical systems that make them unable to meet the demands of today?

The modern landscape I think of in two broad buckets. On the left side is the operational data stores: developers are building applications, those applications use the database. On the right is data warehouses; that's the analytical data stores. They store historical data. On the left-hand side, we have the system of record for now. On the right-hand side, it's the system of record for history.

On that right-hand side, there's five platforms emerging — Snowflake, Databricks, Microsoft, AWS and Google Cloud — and everything else circulates around them. And there's a ton of innovation happening right outside of those five. On the left-hand side, you have the cloud platforms, one company that has gone public, which is MongoDB, and two companies that are coming up behind: Redis Labs and Neo4j.

Graph databases are focused on connected data. As the real world is becoming more connected, the data is becoming more connected. And the challenge with that is that you can't store and connect the data in a good way in a tabular database. Fundamentally that's what we're optimized for: finding patterns in connected data.

You've discussed a future where you can have very application-specific databases based on the best fit for whatever that application might be. Can you unpack that? Ultimately, how many database vendors do you think enterprises are going to use?

This has two different answers depending on if you sit on the left-hand or the right-hand side of [the industry]. On the right-hand side, it will actually converge into one category. Currently, it's two different paradigms: that data scientist-centric paradigm that Databricks represents and the business data analysts-centric paradigm that Snowflakes represents. That's converging into one category.

On the left-hand side, though, and that's where I spend most of my time, I see a few new categories emerging. We have the broader what I call "document plus-plus" space; that's not an established term, but that's what I call it. This is where MongoDB and Couchbase live. But if you look at actual customer projects, then Redis Labs, DataStax, Cassandra, Couchbase, MongoDB, they all compete for the same slot in those architectures.

How many of these different moving parts do we really want in an application, right? With graph databases, it's like stock ticker symbols or sensor data, which you can't really store in a good way in other types of databases. Then there's newSQL, where a company like Cockroach Labs lives.

Those are the major categories. And in 2030, those are [still] going to be the major categories. And then underlying all of this, of course, is the relational database, which will be around forever. And big banks like UBS, Citi or JPMorgan are going to have strategic vendors for each and every one. It's not going to be more than that. Because ultimately you want to reduce complexity and don't want too many moving parts in your architecture.

How much share of the market can you get? You're trying to tackle a more specialized area, which would seem to eventually have some sort of cap on it?

There's always subcategories in massive markets. And the database market is the single biggest one in all of enterprise software. It's about $50 billion today. But it's going to be $100 billion, already in a few years, like 2024 or 2025 depending on who you ask. All of that growth is driven by these new segments that we're talking about. If you are the leader of one of these big, new segments, those are massive categories. They're way bigger than what the relational database was, for example, in the late '80s, early '90s. So it's very sizable.

From a technology standpoint, what is going to be the main impediment for, say, Databricks to serve as both the analytic and the operational engine within organizations?

No one is gonna be able to straddle both of those completely. It's too big and too complex. The way that you have to write a kernel that is good for the analytical data source versus one that is good for the operational data source is just completely different. And it's not one of those things where if you throw enough money at it, [it's possible]. It comes down to very clear and specific technical tradeoffs.

How do customers not get confused or overwhelmed by all of the different kinds of strategies that other vendors are also promoting?

Increased funding makes for a noisier environment. The flip side of this is: If you add up all the analytical and operational data stores, it looks very, very crowded. But really, if you look at the operational data stores, there's four or five that matter now. That's the big change in the last two, three years. There's a handful of us that have really truly achieved scale. And that makes it less confusing.

You're asking customers to add more complexity to their tech stack. When you think about the return on investment, what's going to be the benefit that outweighs that increased complexity?

It's very easy, because the world is just becoming increasingly connected. And if you're not capable of digitizing those connections, you're going to be left behind. If you're a bank and you use graph databases, you'll be able to capture fraud rings for your fraud detection. We frequently see banks getting a 5% uplift in the number of fraud cases that they can capture. If [your] competitors can capture fraud rings but [you] can't, ultimately, they're going to outcompete. It's coming back to the business value of being able to operate on top of connections.

What is going to stop the hyperscalers from cutting vendors like yourself out of the equation?

We get adopted through developers. The fact that we're the leader in terms of the developer community and data science community, that's huge. That's what makes MongoDB Atlas work. Specifically for enterprises, the fact that we are multicloud, that's massive.

Today, multicloud is an absolute requirement for many enterprise CIOs. And if there's one area of their IP architecture where they care the most, it's for the data. Having that managed by a multicloud offering so they're not beholden to one of the platforms, that's really important for them.


Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep ReadingShow less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep ReadingShow less
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep ReadingShow less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep ReadingShow less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.


Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep ReadingShow less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories