The term “real time” has been infused throughout tech, from real-time stock picks to real-time pizza tracking. But there’s near-real time, and then there’s real time.
As everyday enterprises begin incorporating data tools and tactics used inside the biggest of big tech companies, a sector of data services providers has emerged to help them take advantage of the truly real-time analytics and machine learning approaches only giant companies with far larger database teams and resources could have afforded in the past. Companies like Hazelcast, Rockset, Tecton and others enable split-second analytics and machine learning for things like financial fraud prevention, dynamic pricing or product recommendations that respond to what you just clicked.
These companies promise to leave plodding batch-data processing for old-school business intelligence analysis in the dust. But whether every enterprise needs, wants or is ready to operate at a clip as fast paced as a Citibank, Uber or Amazon remains to be seen.
Updating data every few days, every night or even every hour or so for business analysis using a typical batch processing approach “is like playing Monday morning quarterback,” said Venkat Venkataramani, CEO and co-founder of Rockset, a company that provides a database for building applications for real-time data, analytics and queries. “That is not going to be good enough anymore. I’m six points down, the game is not over yet — what do I do differently to change the outcome of the game?” he said, continuing a football metaphor he believes will represent more and more business scenarios involving fresh data in the next two-to-three years.
These startups believe the increasing influx of real-time data flooding into data lakes and lakehouses — from ecommerce site clicks to IoT sensor pings — will compel businesses to use that information immediately as it flows in.
Some of Hazelcast’s customers process data they consume in real time for machine-learning model-based predictive analytics used to maintain equipment on oil drilling rigs and windmills, said CEO Kelly Herrell. Tecton, which helps companies run real-time data pipelines to feed ML models, has seen insurance providers use its services to operate their driver behavior-based discount programs, according to Mike Del Balso, founder of the company.
But those are rare use cases. These startups say most of their inbound interest is coming from banking and ecommerce customers that want to prevent fraud while someone waits for a banking transaction to happen at an ATM, or to detect right away when a payment app stops working in a particular country. “Wherever there is real money and risk involved if you don’t manage something in real time” is where companies are using real-time data services, said Venkataramani.
“The top use case for us is recommendations,” said Del Balso, who said customers “want to make a recommendation based on what the user just did.” Anyone who has browsed products on an ecommerce site or scrolled through movie options on a content streaming platform knows the micro-frustrations resulting from systems that don’t recognize their most recent moves.
Machine learning helps spark real-time interest
Across the business spectrum, there are a few key factors fueling the rise of real-time data processing and analytics, and the rise of AI and machine learning is an important one. Companies want to use machine-learning systems that improve as they’re exposed to fresh information in the hopes of making smarter decisions and optimizing existing efforts in milliseconds.
While traditional business intelligence analytics efforts don’t need real-time data or processing, “real time and machine learning really go hand-in-hand,” said Gaetan Castelein, vice president of Marketing at Tecton, who explained that real-time data and machine-learning trends are converging, feeding off one another.
Consider a bank conducting millions of transactions each hour, for example. Whether or not the model decides to approve the transaction could be dependent on as many as 2,000 individual pieces of information: some relatively static, such as a zip code, and some brand new, such as a numerical amount of a cash transfer. However, data systems like Tecton’s can optimize for the most efficient approach to managing that process by separating the data pieces that are fresh from the ones that remain the same.
“Because you need to respond to the transaction really quickly, but you don’t want to be computing that stuff in real time,” Del Balso said. “It’s OK if some of the signals are a bit delayed,” he said, adding, “That becomes a performance tradeoff.”
Optimizing a recommendation engine to respond in real time to something a user just did a half-second ago “is the difference between Netflix and TikTok,” said Manish Devgan, chief product officer at Hazelcast. “As you’re browsing it’s actually updating a machine-learning model,” he said of TikTok’s content recommendation system.
In conjunction with the machine-learning boom, innovations in database architecture have also helped propel interest in using data in real time. The availability and ease of use of database technologies used for real-time data analytics such as Apache Kafka and Confluent are helping companies manage real-time data initiatives with smaller engineering teams.
More broadly, the explosion of data streaming in from online and IoT systems, coupled with adoption of the cloud and its cost efficiencies, is also sparking interest in using data in real time.
“As more enterprises continue to migrate to the cloud and invest in digital transformations, the volume, variety and velocity of machine-generated data — clickstream, logs, metrics, IoT — will proliferate exponentially,” said Derek Zanutto, general partner at CapitalG, who added that the Google investment arm does not currently have any companies in the real-time data processing space.
“As the volume of machine-generated data continues to proliferate, forward-thinking, data-driven organizations will increasingly seek out opportunities to mine this data for real-time operational analytics use cases that help them maintain or improve their market leadership,” Zanutto said.
When near-real time is good enough
There’s a difference between what’s real time and what’s merely in the ballpark, according to database experts. “People use the word ‘real time’ in a very abusive manner,” said Ravi Mayuram, chief technology officer at Couchbase, a database company that enables real-time data processing.
He and others say if data analysis happens in minutes rather than seconds or split seconds, it’s not real time; it’s just near-real time. Venkataramani said he defines real-time data processing as something that takes less than two seconds.
“This is kind of complicated and nuanced, so it’s easy to mix together real time and near-real time,” Del Balso said, adding that for most companies and use cases, near-real time is good enough.
Indeed, some experts say processing data every few minutes should suffice for many businesses.
“There are extreme ends of this where you really, really need [real time],” said Ryan Blue, co-founder and CEO of data platform startup Tabular, and a former Netflix database engineer who helped build Iceberg, a core data architecture used for lakehouse-style analytics. “The question is, when is a five-minute batch process sufficient?” Blue said.
Some Rockset customers don’t even use the company’s most extreme real-time data capabilities. Seesaw, an online learning platform, uses Rockset to enable analytics, data visualizations and data queries. But for now, said Emily Voigtlander, Seesaw’s product manager, batch processing every night is just fine. While she did not rule out future needs for Rockset’s real-time data services, Voigtlander said, “It’s not actually what is most essential to our business right now.”
But just wait, some say. Today, companies that are still getting a handle on batch processing might decide to leapfrog the competition, said Preeti Rathi, general partner at venture capital firm Icon Ventures. Those kinds of companies might ask, “If we can directly just jump here, why not?” she said.
The growing interest in real-time analytics and data processing represents what Gerrit Kazmaier, Google Cloud's vice president and general manager for Database, Data Analytics and Looker, called a “paradigm shift” away from traditional data stacks to systems that “connect the systems of intelligence” to applications that let companies influence customer behavior or take action using machine learning and analytics on the spot.
“So now, you come to a tipping point, where suddenly the strategic platform of the enterprise is not anymore the functional system, it’s the data system,” he said.