Do businesses really need real-time analytics? Data startups are counting on it.

Real-time database startups enable split-second analytics and machine learning for things like financial fraud prevention, dynamic pricing or product recommendations. But do everyday enterprises really need lightning-fast real-time data?

digitization of the city concept of the future, binary number

These companies promise to leave plodding batch-data processing for old-school business intelligence analysis in the dust.

Illustration: Surasak Suwanmake/Moment/Getty Images

The term “real time” has been infused throughout tech, from real-time stock picks to real-time pizza tracking. But there’s near-real time, and then there’s real time.

As everyday enterprises begin incorporating data tools and tactics used inside the biggest of big tech companies, a sector of data services providers has emerged to help them take advantage of the truly real-time analytics and machine learning approaches only giant companies with far larger database teams and resources could have afforded in the past. Companies like Hazelcast, Rockset, Tecton and others enable split-second analytics and machine learning for things like financial fraud prevention, dynamic pricing or product recommendations that respond to what you just clicked.

These companies promise to leave plodding batch-data processing for old-school business intelligence analysis in the dust. But whether every enterprise needs, wants or is ready to operate at a clip as fast paced as a Citibank, Uber or Amazon remains to be seen.

Updating data every few days, every night or even every hour or so for business analysis using a typical batch processing approach “is like playing Monday morning quarterback,” said Venkat Venkataramani, CEO and co-founder of Rockset, a company that provides a database for building applications for real-time data, analytics and queries. “That is not going to be good enough anymore. I’m six points down, the game is not over yet — what do I do differently to change the outcome of the game?” he said, continuing a football metaphor he believes will represent more and more business scenarios involving fresh data in the next two-to-three years.

These startups believe the increasing influx of real-time data flooding into data lakes and lakehouses — from ecommerce site clicks to IoT sensor pings — will compel businesses to use that information immediately as it flows in.

Some of Hazelcast’s customers process data they consume in real time for machine-learning model-based predictive analytics used to maintain equipment on oil drilling rigs and windmills, said CEO Kelly Herrell. Tecton, which helps companies run real-time data pipelines to feed ML models, has seen insurance providers use its services to operate their driver behavior-based discount programs, according to Mike Del Balso, founder of the company.

But those are rare use cases. These startups say most of their inbound interest is coming from banking and ecommerce customers that want to prevent fraud while someone waits for a banking transaction to happen at an ATM, or to detect right away when a payment app stops working in a particular country. “Wherever there is real money and risk involved if you don’t manage something in real time” is where companies are using real-time data services, said Venkataramani.

“The top use case for us is recommendations,” said Del Balso, who said customers “want to make a recommendation based on what the user just did.” Anyone who has browsed products on an ecommerce site or scrolled through movie options on a content streaming platform knows the micro-frustrations resulting from systems that don’t recognize their most recent moves.

Machine learning helps spark real-time interest

Across the business spectrum, there are a few key factors fueling the rise of real-time data processing and analytics, and the rise of AI and machine learning is an important one. Companies want to use machine-learning systems that improve as they’re exposed to fresh information in the hopes of making smarter decisions and optimizing existing efforts in milliseconds.

While traditional business intelligence analytics efforts don’t need real-time data or processing, “real time and machine learning really go hand-in-hand,” said Gaetan Castelein, vice president of Marketing at Tecton, who explained that real-time data and machine-learning trends are converging, feeding off one another.

Consider a bank conducting millions of transactions each hour, for example. Whether or not the model decides to approve the transaction could be dependent on as many as 2,000 individual pieces of information: some relatively static, such as a zip code, and some brand new, such as a numerical amount of a cash transfer. However, data systems like Tecton’s can optimize for the most efficient approach to managing that process by separating the data pieces that are fresh from the ones that remain the same.

“Because you need to respond to the transaction really quickly, but you don’t want to be computing that stuff in real time,” Del Balso said. “It’s OK if some of the signals are a bit delayed,” he said, adding, “That becomes a performance tradeoff.”

Optimizing a recommendation engine to respond in real time to something a user just did a half-second ago “is the difference between Netflix and TikTok,” said Manish Devgan, chief product officer at Hazelcast. “As you’re browsing it’s actually updating a machine-learning model,” he said of TikTok’s content recommendation system.

In conjunction with the machine-learning boom, innovations in database architecture have also helped propel interest in using data in real time. The availability and ease of use of database technologies used for real-time data analytics such as Apache Kafka and Confluent are helping companies manage real-time data initiatives with smaller engineering teams.

More broadly, the explosion of data streaming in from online and IoT systems, coupled with adoption of the cloud and its cost efficiencies, is also sparking interest in using data in real time.

“As more enterprises continue to migrate to the cloud and invest in digital transformations, the volume, variety and velocity of machine-generated data — clickstream, logs, metrics, IoT — will proliferate exponentially,” said Derek Zanutto, general partner at CapitalG, who added that the Google investment arm does not currently have any companies in the real-time data processing space.

“As the volume of machine-generated data continues to proliferate, forward-thinking, data-driven organizations will increasingly seek out opportunities to mine this data for real-time operational analytics use cases that help them maintain or improve their market leadership,” Zanutto said.

When near-real time is good enough

There’s a difference between what’s real time and what’s merely in the ballpark, according to database experts. “People use the word ‘real time’ in a very abusive manner,” said Ravi Mayuram, chief technology officer at Couchbase, a database company that enables real-time data processing.

He and others say if data analysis happens in minutes rather than seconds or split seconds, it’s not real time; it’s just near-real time. Venkataramani said he defines real-time data processing as something that takes less than two seconds.

“This is kind of complicated and nuanced, so it’s easy to mix together real time and near-real time,” Del Balso said, adding that for most companies and use cases, near-real time is good enough.

Indeed, some experts say processing data every few minutes should suffice for many businesses.

“There are extreme ends of this where you really, really need [real time],” said Ryan Blue, co-founder and CEO of data platform startup Tabular, and a former Netflix database engineer who helped build Iceberg, a core data architecture used for lakehouse-style analytics. “The question is, when is a five-minute batch process sufficient?” Blue said.

Some Rockset customers don’t even use the company’s most extreme real-time data capabilities. Seesaw, an online learning platform, uses Rockset to enable analytics, data visualizations and data queries. But for now, said Emily Voigtlander, Seesaw’s product manager, batch processing every night is just fine. While she did not rule out future needs for Rockset’s real-time data services, Voigtlander said, “It’s not actually what is most essential to our business right now.”

But just wait, some say. Today, companies that are still getting a handle on batch processing might decide to leapfrog the competition, said Preeti Rathi, general partner at venture capital firm Icon Ventures. Those kinds of companies might ask, “If we can directly just jump here, why not?” she said.

The growing interest in real-time analytics and data processing represents what Gerrit Kazmaier, Google Cloud's vice president and general manager for Database, Data Analytics and Looker, called a “paradigm shift” away from traditional data stacks to systems that “connect the systems of intelligence” to applications that let companies influence customer behavior or take action using machine learning and analytics on the spot.

“So now, you come to a tipping point, where suddenly the strategic platform of the enterprise is not anymore the functional system, it’s the data system,” he said.


Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep ReadingShow less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep ReadingShow less
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep ReadingShow less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep ReadingShow less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.


Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep ReadingShow less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories