Databricks is pushing a new architecture known as the data lakehouse, one that backers say will obliterate the need for data warehouses, the de facto industry standard for decades. Such a move would be akin to a new browser design eliminating Google Chrome. And it's clear why Databricks has Snowflake in its sights: The company commands a market cap of $107 billion after re-architecting the data warehouse for the cloud era.
On Tuesday, that goal will get a big boost. Databricks is poised to announce that an independent industry group known as the Transaction Processing Performance Council (TPC) validated results which show that Databricks' systems outperformed the closest data warehouse competitor by 2.2x.
"We basically proved that we can beat [Snowflake] at their own game, which is the data warehousing game," CEO Ali Ghodsi told Protocol.
It's deeply technical, but at a high level, Databricks SQL — the company's flagship data warehouse product — was able to execute "32,941,245 queries in one hour on a large data warehouse of size 100 TB," according to a blog post scheduled to be released Tuesday morning. On top of the speed accomplishment, Databricks said it achieved this milestone with a 10% cost reduction from the prior record holder, Alibaba.
Snowflake, as well as others in the industry, are bound to try to counter those claims in some capacity — whether publicly or behind the scenes with customers. And the distinction from the TPC is unlikely to have an immediate material impact on Databricks' financials.
"At the enterprise level, maybe some CIO is going to care about what your official TPC ranking is, but they don't make sales that way," said Carnegie Mellon University associate professor Andy Pavlo.
'That's worth paying attention to'
But while the influence of the TPC has waned over the years, it still carries weight.
Started in the 1980s, the organization serves as somewhat of a neutral umpire in the world of evaluating database performance. The TPC publishes benchmarks that companies can run their systems against. The group then reviews the results for official certification.
As the industry has exploded and grown hyper-competitive, those benchmarks may be adding more confusion than clarity. Some vendors, for example, tout results that haven't been officially approved by the TPC.
Databricks said the latest results were "audited and made public" by the TPC. And the size of the increase in performance is noteworthy, enough to perhaps perk the ears of some potential customers.
"That's worth paying attention to," said Pavlo.
Regardless, Databricks still has some way to go to surpass its rival. Less than 10% of its revenue comes from Databricks SQL but the product is growing "very fast," Ghodsi said.
Still, it's the latest in a series of moves and announcements by Databricks intended to amp up the competition with Snowflake to new heights. The company has been putting the $3.5 billion raised from investors to date to hire top talent focused on building out its competing product to Snowflake's.
Michalis Petropoulos joined in June as a senior director of engineering. He previously helped lead Google's BigQuery team and oversaw all of Amazon Redshift. And Sridhar Machiraju, who previously led the Spanner team at Google, joined in November also as a senior director of engineering.
That's just a fraction of the over-a-dozen former AWS, Google, Snowflake and IBM employees that have joined Databricks in the past year. And more hires are looming: Amit Shukla, who was a director of engineering at Google, is slated to join later this month.
"Our team that is working on the core data warehouse ... is probably actually larger than Snowflake's at this point," Databricks co-founder Reynold Xin proclaimed.
Between the recent fundraising rounds, TPC results and slew of new hires, it's clear Databricks has momentum. And with over $600 million in annual recurring revenue as of Aug. 31, it's also clear that there is enthusiasm for the company's data lakehouse model.
But it's a tough road ahead. And while Ghodsi is quick to proclaim the end of the data warehouse is near, it'll take much more than an audit from an industry body to not only kill one of the industry's dominant vendors, but displace a tech that has maintained its popularity in the world of enterprise tech for the past 20 years.