Welcome to the Protocol Power Index, a ranking of the most powerful companies by tech industry subsector, as well as the companies best positioned to challenge them. This time: databases.
The modern enterprise must be able to fluidly collect, analyze and access its internal data. Cloud-native database companies play a key role in this process. Their products — data lakes, data warehouses and data lakehouses, to name a few — exist in a rapidly evolving slot within the tech stack, wedged between the public cloud and machine-learning services.
Those offerings may all sound similar, but they currently refer to important distinctions in intended use cases and end users. Still, the companies offering them are increasingly overlapping with one another, creating an arms race that's resulted in booming valuations and an ever-growing suite of products quickly becoming core to the industry.
But which companies have the lead right now? And which ones are challenging that dominance? We've ranked the market for you.
|Power Score: 71.32||Momentum Score: 82.35 (2)||HQ: Bozeman, MT||CEO: Frank Slootman||Founded: 2012|
Snowflake has grown at a dizzying speed in recent years. The company completed a blockbuster IPO in September 2020, raising $3.4 billion and garnering a valuation of $70.4 billion; less than a year prior, it had raised funds at a $12.4 billion valuation. Headcount nearly doubled in the same timeframe, and it has an exceptional customer net retention rate of 168%. The database industry is currently Snowflake's to lose — but the company is not without vulnerabilities.
|Power Score: 69.40||Momentum Score: 70.0 (4)||HQ: San Francisco, CA||CEO: Ali Ghodsi||Founded: 2013|
Databricks poses the most direct threat to Snowflake, which bears out in the Power Scores. Like Snowflake, Databricks has seen its valuation grow several times over in the past year: It closed a $1 billion funding round in February 2021 at a $28 billion valuation, up by 352% year-over-year. Databricks is firmly rooted in the Bay Area tech scene, and as a result it's been able to attract top-drawer talent. The big question now: When will Databricks follow its big rival to the public markets?
|Power Score: 64.13||Momentum Score: 52.94 (6)||HQ: Palo Alto, CA||CEO: Bipul Sinha||Founded: 2014|
Founded in 2014, Rubrik is a relative newcomer to data warehousing, but it's a company that came out of the gate as one of the hottest in the sector. While it took both Snowflake and Databricks more than five years to hit unicorn status, Rubrik eclipsed a $1 billion valuation in just over two years. But while Rubrik has continued to grow, its post-unicorn growth rate has fallen short of that of the two companies ranked ahead of it in the Power Index. Now, it's making a bet on providing its capabilities to government branches and services for future growth.
|Power Score: 55.0||Momentum Score: 38.82 (7)||HQ: San Diego, CA||CEO: Steve McMillan||Founded: 1979|
Teradata is a legacy data warehouse provider, setting it apart from the cloud-native hyperscalers on this list such as Snowflake and Databricks. As a result, Teradata's platform has tended to have greater appeal for companies that manage their own servers. That's slowly changing, but Teradata is still playing catch-up as it transitions to the cloud and is only now settling down after a series of recent senior leadership upheavals.
|Power Score: 47.15||Momentum Score: 96.47 (1)||HQ: Mountain View, CA||CEO: Billy Bosworth||Founded: 2015|
Dremio is small but growing at a fast pace. In January 2021, the company closed a series D funding round in which it raised $135 million at a $1 billion valuation. Dremio says it plans to use the funding to grow its global engineering centers — engineering headcount growth more than doubled over the past year, though the total headcount of 79 engineers still makes it small relative to most of the competitors on the Power Index. Still, it hopes that a new approach that allows customers to directly query data in cloud storage platforms could be its big ticket to growth — which goes some way to explaining its top spot on our momentum leaderboard.
|Power Score: 47.0||Momentum Score: 22.35 (9)||HQ: Santa Clara, CA||CEO: Robert Bearden||Founded: 2008|
Cloudera went public in 2017 as a data-management, machine-learning and advanced analytics enterprise platform. It never turned a profit as a public company, and by 2019 its stock price declined about 72% from the market debut price. This all led up to the rather inauspicious launch of the Cloudera Data Platform in September 2019, which brought together data warehouse and machine-learning capabilities.
Cloudera's fortunes have since improved. In June 2021, it agreed to go private as part of a sale to private equity firms Clayton Dubilier & Rice and KKR for an estimated $5.3 billion. To turn a profit on an acquired company, private equity firms tend to subtract rather than add, but for Cloudera, strategic acquisitions could pave the way toward growth: At the same time it announced that it was being acquired, Cloudera said it would purchase Datacoral and Cazena, which could help bolster the collective appeal of its SaaS offering.
|Power Score: 45.24||Momentum Score: 75.29 (3)||HQ: New York, NY||CEO: Spencer Kimball||Founded: 2015|
Sometimes smaller is better — or, at least, Cockroach Labs hopes to prove that's the case with its niche database offering. The company is focused on the infrastructure component of databases, and its CockroachDB product allows companies to build cloud architectures on top of it. The team at Cockroach Labs is also relatively small at 318 people, though it has a high proportion of employees with previous Big Tech work experience.
In an interview with Protocol earlier this year, Cockroach Labs CEO Spencer Kimball said, "If we can solve the operational, relational needs of a company and take them into this [new] way of doing data architecture, we'll win a substantial fraction of the largest market in software."
|Power Score: 26.51||Momentum Score: 55.29 (5)||HQ: Nuremberg, Germany||CEO: Aaron Auld||Founded: 2000|
Based in Germany, Exasol has focused its data product line around its analytics capabilities. In the last couple of years, the company has signed on big EMEA clients, like Germany-based Deutsche Bahn and T-Mobile as well as U.K.-based Revolut. However, the company is working to break through in the lucrative U.S. market, which CEO Aaron Auld has said was a goal in 2019 and 2020. Its recent entrance into AWS's Independent Software Vendors Accelerate program should help the company make deeper inroads in different geographies over time, but it's clearly been slow-going so far.
|Power Score: 26.46||Momentum Score: 32.94 (8)||HQ: San Francisco, CA||CEO: Yaniv Leven||Founded: 2015|
There are more players on the Ohio State football roster than there are employees at Panoply. The Israel-based company only has 47 employees, but it made the list for a reason: Panoply's platform stands out from a crowded field due to its ease of use and low-code environment. As the database industry develops, it isn't hard to imagine a future in which non-technical employees are expected to work with databases on a regular basis, and Panoply is building a platform that could help make that happen.
Panoply's co-founder and co-CEO, Yaniv Leven, brings with him C-suite experience from analytics startups Mytopia and Win. And the company has already scored a few big-name technical partnerships, including those with Looker, AWS and Tableau.
|Power Score: 26.02||Momentum Score: 16.0 (10)||HQ: Palo Alto, CA||CEO: Neil Carson||Founded: 2014|
Yellowbrick was founded in 2014 with hardware in mind — that is, the company hoped to deliver performance upgrades to database systems by designing proprietary hardware systems that would ultimately run a query. But proprietary hardware for niche applications tends to be limited to private cloud deployments, so Yellowbrick has since shifted focus to the software side. It launched Workload Management in Yellowbrick in 2019, which brings the native object store approach to workloads in AWS and Azure public clouds.
Shifting its focus to public clouds has seemingly paid dividends. Yellowbrick's valuation nearly doubled between its last two funding rounds.. It has generally posted impressive growth across the board, though engineering headcount stands out as an anomaly, declining 11% year-over-year.
Explore the Data
The Protocol Power Index is designed to view power through a holistic lens that reflects how modern tech companies amass and exercise their strength. To do so, the Power Index takes into account 30 metrics across five categories — Economics, Leadership, Innovation, People and Politics & Policy — and synthesizes them into a single Power Score. Read our full methodology statement here.
What happens next?
Three forces stand to shape companies operating in the database space over the coming years: a blurring of database technologies, a demand for the technology despite a shortage of talent to use it and the ever-looming threat of public cloud vendors.
|3||Cockroach Labs ↑||75.29|
Product types are set to blur. It may be tempting to write off database terminology as marketing jargon — is a data warehouse really that much different from a data lake? And what's a "data lakehouse" anyway? — but it refers to important distinctions in intended use cases and end users.
- Let's start with the difference between data warehouses and data lakes: Relative to data lakes, data warehouses tend to contain more structured data and be more readily queryable. Business analysts often use data warehouses to build dashboards using historical data; data scientists might tap into data lakes to build predictive machine-learning models that require months of development time.
- These definitions make meaningful distinctions today, but that's changing. Databricks and Dremio have started popularizing the "data lakehouse" concept. "The lakehouse is a technological breakthrough that enables you to do both in one place," Databricks CEO Ali Ghodsi said at a Protocol event in June 2021. "So all your data in one place on a data lake and have [business intelligence] to ask questions about the past and AI to ask questions about the future. So it sort of revolutionizes what organizations are doing with data and AI."
- Even if companies don't adopt the "data lakehouse" nomenclature, data lakes will continue to look more like data warehouses, and vice versa. For instance, Snowflake recently added Java and Python support in an attempt to appeal to data scientists, who tend to favor less-structured databases over its SQL-friendly data warehouses. On the flip side, Dremio recently added SQL functionality to its data lake platform, allowing end users to start an SQL query without first moving less structured data into a data warehouse. The convergence of product categories will result in more intense competition; whereas there's now room for several medium-sized fish in medium-sized ponds, soon those ponds will be converging.
Software will be shaped by availability of expertise. Many enterprises still lack the internal resources required to make the most of new database technology. Open-source database standards could help remedy the situation, allowing data scientists to tap into available code repositories and more easily migrate between jobs.
- Data analysis skills are still highly sought-after by U.S. tech employers. This comes even after the prevalence of data scientists in the tech employee pool more than tripled between May 2015 and May 2021. "It's not a technology challenge. It's a talent challenge," Whirlpool CIO Dani Brown told Protocol earlier this year.
- Hiring data scientists is only a first step. Before they can start generating insights, data scientists must get acclimated to a new database system: Even a veteran data scientist would need to become familiar with a new operating environment if they switch from a job using Snowflake to one using Dremio, which can take several weeks.
- Some big players in the database industry — most notably Databricks — are pushing for open architectures to help address this learning curve problem. Vendors including Tableau, Tencent, Alibaba and Informatica have started using and contributing to the open-source Delta Lake architecture supported by Databricks. The standardized architecture could also help bolster online code repositories, like those on GitHub, that can reduce the workload associated with complex database projects.
- However, an industry migration to open architectures is no sure thing. Snowflake product head Christian Kleinerman told Protocol this year that open-file formats "limit innovation in the long run." The company explained its stance in a March 2021 blog post: "It's not about rejecting open; it is about delivering better value for our customers. We balance this with making it very easy to get data in and out in standard formats."
- Snowflake has the ability to maintain this stance because it is already so large. Smaller companies in the database space will be more likely to adopt open architectures, since they reduce the learning curve associated with their products. Open architectures might also allow for greater interoperability and collaboration within an organization, because employees wouldn't need the same software licenses to interact with a dataset.
And the threat of dominant public cloud vendors looms large. It dominates the stack both immediately above and below vendors like Databricks and Snowflake: Amazon, Google and Microsoft command dominant positions in both the public cloud and AI/machine-learning markets.
- Ultimately, these public cloud vendors could be in a position to squeeze the independent database vendors from both directions. The firms could use bundling or preferential ecosystem access to perform this squeeze.
- The AI layer of database platforms in particular has become an important source of differentiation. This could play to the strengths of Amazon, Google and Microsoft, which all have their own database platforms and all offer robust AI systems. Amazon offers S3, Athena and SageMaker; Google offers BigQuery; and Microsoft offers Access and Azure ML.
- For now, though, database vendors benefit from their independence from public cloud vendors. Their platforms exist on a software stack layer above the cloud, which gives enterprises greater leeway to adopt them regardless of their cloud vendor setup.
To rank the competitors, we've developed a formula that encapsulates 30 criteria. Those criteria span five groupings that factor into power: Economics, Leadership, People, Innovation & Politics and Policy. We then developed two systems for weighting the criteria — one for measuring power and the other for measuring momentum — such that companies can be scored on a 0–100 scale. Read our full methodology here.