Databases will be a $100 billion market. Neo4j’s CEO just needs a sliver.

The graph database company raised $325 million in June, which Neo4j said is the largest ever for a database startup.

​Neo4j CEO Emil Eifrem

Neo4j CEO Emil Eifrem is eyeing the database industry split.

Photo: Neo4j

In Neo4j CEO Emil Eifrem's mind, the database industry is split into two camps: systems that deal with historical data and those that support real-time processing.

Neo4j would be in the latter category. It's a subsector that Eifrem said is dominated by six players: Microsoft, Google Cloud, AWS, Redis Labs, MongoDB and, of course, Neo4j. But Eifrem is betting that owning just a sliver of the booming market will be lucrative.

"The database market is the single biggest one in all of enterprise software. It's about $50 billion today. But it's going to be $100 billion" in just a few years, he told Protocol. "If you are the leader of one of these big, new segments, those are massive categories. They're way bigger than what the relational database was, for example, in the late '80s, early '90s."

Other leaders in the real-time processing sector are likely to disagree with Eifrem's viewpoint. But Neo4j is a bit different in that it's peddling a newer type of architecture called graph databases, systems that are able to take the increasing amount of information that companies are collecting and draw immediate connections between them, something that Eifrem argued is impossible with the tabular databases of the past.

The company is not alone among graph database companies; AWS, for example, launched its own graph database called Neptune in 2018. But Neo4j is well on the way to establishing itself as a leader. The company raised $325 million in June — the most ever for a database startup, according to Neo4j — and is considering an IPO in the next few years, according to a marketing slide viewed by Protocol.

In an interview with Protocol, Eifrem talked about why he thinks the split within the market will grow more pronounced and why enterprises are increasingly picking databases tailored to specific end applications.

This interview has been edited and condensed for clarity.

What are graph databases and why did you have to create a new category within this industry? What is it about the historical systems that make them unable to meet the demands of today?

The modern landscape I think of in two broad buckets. On the left side is the operational data stores: developers are building applications, those applications use the database. On the right is data warehouses; that's the analytical data stores. They store historical data. On the left-hand side, we have the system of record for now. On the right-hand side, it's the system of record for history.

On that right-hand side, there's five platforms emerging — Snowflake, Databricks, Microsoft, AWS and Google Cloud — and everything else circulates around them. And there's a ton of innovation happening right outside of those five. On the left-hand side, you have the cloud platforms, one company that has gone public, which is MongoDB, and two companies that are coming up behind: Redis Labs and Neo4j.

Graph databases are focused on connected data. As the real world is becoming more connected, the data is becoming more connected. And the challenge with that is that you can't store and connect the data in a good way in a tabular database. Fundamentally that's what we're optimized for: finding patterns in connected data.

You've discussed a future where you can have very application-specific databases based on the best fit for whatever that application might be. Can you unpack that? Ultimately, how many database vendors do you think enterprises are going to use?

This has two different answers depending on if you sit on the left-hand or the right-hand side of [the industry]. On the right-hand side, it will actually converge into one category. Currently, it's two different paradigms: that data scientist-centric paradigm that Databricks represents and the business data analysts-centric paradigm that Snowflakes represents. That's converging into one category.

On the left-hand side, though, and that's where I spend most of my time, I see a few new categories emerging. We have the broader what I call "document plus-plus" space; that's not an established term, but that's what I call it. This is where MongoDB and Couchbase live. But if you look at actual customer projects, then Redis Labs, DataStax, Cassandra, Couchbase, MongoDB, they all compete for the same slot in those architectures.

How many of these different moving parts do we really want in an application, right? With graph databases, it's like stock ticker symbols or sensor data, which you can't really store in a good way in other types of databases. Then there's newSQL, where a company like Cockroach Labs lives.

Those are the major categories. And in 2030, those are [still] going to be the major categories. And then underlying all of this, of course, is the relational database, which will be around forever. And big banks like UBS, Citi or JPMorgan are going to have strategic vendors for each and every one. It's not going to be more than that. Because ultimately you want to reduce complexity and don't want too many moving parts in your architecture.

How much share of the market can you get? You're trying to tackle a more specialized area, which would seem to eventually have some sort of cap on it?

There's always subcategories in massive markets. And the database market is the single biggest one in all of enterprise software. It's about $50 billion today. But it's going to be $100 billion, already in a few years, like 2024 or 2025 depending on who you ask. All of that growth is driven by these new segments that we're talking about. If you are the leader of one of these big, new segments, those are massive categories. They're way bigger than what the relational database was, for example, in the late '80s, early '90s. So it's very sizable.

From a technology standpoint, what is going to be the main impediment for, say, Databricks to serve as both the analytic and the operational engine within organizations?

No one is gonna be able to straddle both of those completely. It's too big and too complex. The way that you have to write a kernel that is good for the analytical data source versus one that is good for the operational data source is just completely different. And it's not one of those things where if you throw enough money at it, [it's possible]. It comes down to very clear and specific technical tradeoffs.

How do customers not get confused or overwhelmed by all of the different kinds of strategies that other vendors are also promoting?

Increased funding makes for a noisier environment. The flip side of this is: If you add up all the analytical and operational data stores, it looks very, very crowded. But really, if you look at the operational data stores, there's four or five that matter now. That's the big change in the last two, three years. There's a handful of us that have really truly achieved scale. And that makes it less confusing.

You're asking customers to add more complexity to their tech stack. When you think about the return on investment, what's going to be the benefit that outweighs that increased complexity?

It's very easy, because the world is just becoming increasingly connected. And if you're not capable of digitizing those connections, you're going to be left behind. If you're a bank and you use graph databases, you'll be able to capture fraud rings for your fraud detection. We frequently see banks getting a 5% uplift in the number of fraud cases that they can capture. If [your] competitors can capture fraud rings but [you] can't, ultimately, they're going to outcompete. It's coming back to the business value of being able to operate on top of connections.

What is going to stop the hyperscalers from cutting vendors like yourself out of the equation?

We get adopted through developers. The fact that we're the leader in terms of the developer community and data science community, that's huge. That's what makes MongoDB Atlas work. Specifically for enterprises, the fact that we are multicloud, that's massive.

Today, multicloud is an absolute requirement for many enterprise CIOs. And if there's one area of their IP architecture where they care the most, it's for the data. Having that managed by a multicloud offering so they're not beholden to one of the platforms, that's really important for them.


We’ll be here again: How tech companies fail to prevent terrorism

Social media platforms are playing defense to stop mass shootings. Without cooperation and legislation, it’s not working.

The Buffalo attack showed that tech’s best defenses against online hate aren’t sophisticated enough to fight the algorithms designed by those same companies to promote content.

Photo: Kent Nishimura / Los Angeles Times via Getty Images

Tech platforms' patchwork approach to content moderation has made them a hotbed for hate speech that can turn deadly, as it did this weekend in Buffalo. The alleged shooter that killed 10 in a historically Black neighborhood used Discord to plan his rampage for months and livestreamed it on Twitch.

The move mirrors what happened in Christchurch, New Zealand, when a white supremacist murdered 51 people in a mosque in 2019. He viewed the killings as a meme. To disseminate that meme, he turned to the same place more than 1 billion other users do: Facebook. This pattern is destined to repeat itself as long as tech companies continue to play defense instead of offense against online hate and fail to work together.

Keep Reading Show less
Sarah Roach

Sarah Roach is a news writer at Protocol (@sarahroach_) and contributes to Source Code. She is a recent graduate of George Washington University, where she studied journalism and mass communication and criminal justice. She previously worked for two years as editor in chief of her school's independent newspaper, The GW Hatchet.

Sponsored Content

Foursquare data story: leveraging location data for site selection

We take a closer look at points of interest and foot traffic patterns to demonstrate how location data can be leveraged to inform better site selecti­on strategies.

Imagine: You’re the leader of a real estate team at a restaurant brand looking to open a new location in Manhattan. You have two options you’re evaluating: one site in SoHo, and another site in the Flatiron neighborhood. Which do you choose?

Keep Reading Show less

SAP’s leadership vacuum on display with Hasso Plattner’s last stand

Conflict of interest questions, blowback to the Ukraine response and a sinking stock price hang in the backdrop of Plattner’s last election to the SAP supervisory board.

Plattner will run for a final two-year transition term atop SAP’s supervisory board.

Photo: Soeren Stache/picture alliance via Getty Images

Just one man has been with SAP over its entire 50-year history: co-founder Hasso Plattner. Now, the 78-year-old software visionary is making his last stand.

On Wednesday, Plattner will run for a final two-year transition term atop SAP’s supervisory board, an entity mandated by law in Germany that basically oversees the executive team. Leaders at SAP, for example, report to the supervisory board, not the CEO.

Keep Reading Show less
Joe Williams

Joe Williams is a writer-at-large at Protocol. He previously covered enterprise software for Protocol, Bloomberg and Business Insider. Joe can be reached at JoeWilliams@Protocol.com. To share information confidentially, he can also be contacted on a non-work device via Signal (+1-309-265-6120) or JPW53189@protonmail.com.


Why Google Cloud is providing security for AWS and Azure users too

“To just focus on Google Cloud, we wouldn't be serving our customers,” Google Cloud security chief Phil Venables told Protocol.

Google Cloud announced the newest addition to its menu of security offerings.

Photo: G/Unsplash

In August, Google Cloud pledged to invest $10 billion over five years in cybersecurity — a target that looks like it will be easily achieved, thanks to the $5.4 billion deal to acquire Mandiant and reported $500 million acquisition of Siemplify in the first few months of 2022 alone.

But the moves raise questions about Google Cloud’s main goal for its security operation. Does Google want to offer the most secure cloud platform in order to inspire more businesses to run on it — or build a major enterprise cybersecurity products and services business, in whatever environment it’s chosen?

Keep Reading Show less
Kyle Alspach

Kyle Alspach ( @KyleAlspach) is a senior reporter at Protocol, focused on cybersecurity. He has covered the tech industry since 2010 for outlets including VentureBeat, CRN and the Boston Globe. He lives in Portland, Oregon, and can be reached at kalspach@procotol.com.


The tools that make you pay for not getting stuff done

Some tools let you put your money on the line for productivity. Should you bite?

Commitment contracts are popular in a niche corner of the internet, and the tools have built up loyal followings of people who find the extra motivation effective.

Photoillustration: Anna Shvets/Pexels; Protocol

Danny Reeves, CEO and co-founder of Beeminder, is used to defending his product.

“When people first hear about it, they’re kind of appalled,” Reeves said. “Making money off of people’s failure is how they view it.”

Keep Reading Show less
Lizzy Lawrence

Lizzy Lawrence ( @LizzyLaw_) is a reporter at Protocol, covering tools and productivity in the workplace. She's a recent graduate of the University of Michigan, where she studied sociology and international studies. She served as editor in chief of The Michigan Daily, her school's independent newspaper. She's based in D.C., and can be reached at llawrence@protocol.com.

Latest Stories