In Neo4j CEO Emil Eifrem's mind, the database industry is split into two camps: systems that deal with historical data and those that support real-time processing.
Neo4j would be in the latter category. It's a subsector that Eifrem said is dominated by six players: Microsoft, Google Cloud, AWS, Redis Labs, MongoDB and, of course, Neo4j. But Eifrem is betting that owning just a sliver of the booming market will be lucrative.
"The database market is the single biggest one in all of enterprise software. It's about $50 billion today. But it's going to be $100 billion" in just a few years, he told Protocol. "If you are the leader of one of these big, new segments, those are massive categories. They're way bigger than what the relational database was, for example, in the late '80s, early '90s."
Other leaders in the real-time processing sector are likely to disagree with Eifrem's viewpoint. But Neo4j is a bit different in that it's peddling a newer type of architecture called graph databases, systems that are able to take the increasing amount of information that companies are collecting and draw immediate connections between them, something that Eifrem argued is impossible with the tabular databases of the past.
The company is not alone among graph database companies; AWS, for example, launched its own graph database called Neptune in 2018. But Neo4j is well on the way to establishing itself as a leader. The company raised $325 million in June — the most ever for a database startup, according to Neo4j — and is considering an IPO in the next few years, according to a marketing slide viewed by Protocol.
In an interview with Protocol, Eifrem talked about why he thinks the split within the market will grow more pronounced and why enterprises are increasingly picking databases tailored to specific end applications.
This interview has been edited and condensed for clarity.
What are graph databases and why did you have to create a new category within this industry? What is it about the historical systems that make them unable to meet the demands of today?
The modern landscape I think of in two broad buckets. On the left side is the operational data stores: developers are building applications, those applications use the database. On the right is data warehouses; that's the analytical data stores. They store historical data. On the left-hand side, we have the system of record for now. On the right-hand side, it's the system of record for history.
On that right-hand side, there's five platforms emerging — Snowflake, Databricks, Microsoft, AWS and Google Cloud — and everything else circulates around them. And there's a ton of innovation happening right outside of those five. On the left-hand side, you have the cloud platforms, one company that has gone public, which is MongoDB, and two companies that are coming up behind: Redis Labs and Neo4j.
Graph databases are focused on connected data. As the real world is becoming more connected, the data is becoming more connected. And the challenge with that is that you can't store and connect the data in a good way in a tabular database. Fundamentally that's what we're optimized for: finding patterns in connected data.
You've discussed a future where you can have very application-specific databases based on the best fit for whatever that application might be. Can you unpack that? Ultimately, how many database vendors do you think enterprises are going to use?
This has two different answers depending on if you sit on the left-hand or the right-hand side of [the industry]. On the right-hand side, it will actually converge into one category. Currently, it's two different paradigms: that data scientist-centric paradigm that Databricks represents and the business data analysts-centric paradigm that Snowflakes represents. That's converging into one category.
On the left-hand side, though, and that's where I spend most of my time, I see a few new categories emerging. We have the broader what I call "document plus-plus" space; that's not an established term, but that's what I call it. This is where MongoDB and Couchbase live. But if you look at actual customer projects, then Redis Labs, DataStax, Cassandra, Couchbase, MongoDB, they all compete for the same slot in those architectures.
How many of these different moving parts do we really want in an application, right? With graph databases, it's like stock ticker symbols or sensor data, which you can't really store in a good way in other types of databases. Then there's newSQL, where a company like Cockroach Labs lives.
Those are the major categories. And in 2030, those are [still] going to be the major categories. And then underlying all of this, of course, is the relational database, which will be around forever. And big banks like UBS, Citi or JPMorgan are going to have strategic vendors for each and every one. It's not going to be more than that. Because ultimately you want to reduce complexity and don't want too many moving parts in your architecture.
How much share of the market can you get? You're trying to tackle a more specialized area, which would seem to eventually have some sort of cap on it?
There's always subcategories in massive markets. And the database market is the single biggest one in all of enterprise software. It's about $50 billion today. But it's going to be $100 billion, already in a few years, like 2024 or 2025 depending on who you ask. All of that growth is driven by these new segments that we're talking about. If you are the leader of one of these big, new segments, those are massive categories. They're way bigger than what the relational database was, for example, in the late '80s, early '90s. So it's very sizable.
From a technology standpoint, what is going to be the main impediment for, say, Databricks to serve as both the analytic and the operational engine within organizations?
No one is gonna be able to straddle both of those completely. It's too big and too complex. The way that you have to write a kernel that is good for the analytical data source versus one that is good for the operational data source is just completely different. And it's not one of those things where if you throw enough money at it, [it's possible]. It comes down to very clear and specific technical tradeoffs.
How do customers not get confused or overwhelmed by all of the different kinds of strategies that other vendors are also promoting?
Increased funding makes for a noisier environment. The flip side of this is: If you add up all the analytical and operational data stores, it looks very, very crowded. But really, if you look at the operational data stores, there's four or five that matter now. That's the big change in the last two, three years. There's a handful of us that have really truly achieved scale. And that makes it less confusing.
You're asking customers to add more complexity to their tech stack. When you think about the return on investment, what's going to be the benefit that outweighs that increased complexity?
It's very easy, because the world is just becoming increasingly connected. And if you're not capable of digitizing those connections, you're going to be left behind. If you're a bank and you use graph databases, you'll be able to capture fraud rings for your fraud detection. We frequently see banks getting a 5% uplift in the number of fraud cases that they can capture. If [your] competitors can capture fraud rings but [you] can't, ultimately, they're going to outcompete. It's coming back to the business value of being able to operate on top of connections.
What is going to stop the hyperscalers from cutting vendors like yourself out of the equation?
We get adopted through developers. The fact that we're the leader in terms of the developer community and data science community, that's huge. That's what makes MongoDB Atlas work. Specifically for enterprises, the fact that we are multicloud, that's massive.
Today, multicloud is an absolute requirement for many enterprise CIOs. And if there's one area of their IP architecture where they care the most, it's for the data. Having that managed by a multicloud offering so they're not beholden to one of the platforms, that's really important for them.