On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.
Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.
Nokleby, who has since left the company, said that for a long time Lily AI got by using a homegrown system, but that wasn’t cutting it anymore. And he said that while some MLops systems can manage a larger number of models, they might not have desired features such as robust data visualization capabilities or the ability to work on premises rather than in cloud environments.
As for finding an MLops platform that works for the company, Lily AI’s CTO and co-founder Sowmiya Chocka Narayanan said last week, "We're still looking.”
As companies expand their use of AI beyond running just a few ML models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, many machine learning practitioners Protocol interviewed for this story say that they have yet to find what they need from prepackaged MLops systems.
“That is the biggest gap in the tech industry right now,” said Nicola Morini Bianzino, global chief client technology officer at EY. The auditing firm has thousands of models in deployment that are used for its customers’ tax returns and other purposes, but has not come across a suitable system for managing various MLops modules, he said.
“I’m actually surprised that none of the big companies have jumped in this space because the opportunity is massive,” Morini Bianzino said.
Depending on how it is defined, projections for the global MLops platform market vary from $3 billion by 2027 to $4 billion by 2025 to $6 billion by 2028. Companies hawking MLops platforms for building and managing machine learning models include tech giants like Amazon, Google, Microsoft, and IBM and lesser-known vendors such as Comet, Cloudera, DataRobot, and Domino Data Lab.
Although the MLops-related platforms available today are “extremely valuable,” said Danny Lange, vice president of AI and machine learning at gaming and automotive AI company Unity Technologies, “nobody right now is doing it at a level that you ideally want. It's actually a complex problem.” Right now, Unity is using a custom-built system to manage the thousands of ML models it has in deployment, Lange said
Millions of models
Like other large enterprises that have invested in ML for years, Southeast Asia’s banking giant DBS has had to build in-house to manage its data analytics and the 400-plus ML models it runs for things like personalized banking, said Sameer Gupta, group chief analytics officer and managing director.
“When DBS started our journey several years ago, the solutions available in the market primarily focused more on AI/ML activities as experiments and did not meet our requirements to iterate and operationalize quickly,” Gupta told Protocol.
“We had to leverage what was available to develop our in-house capabilities that allows us to better tailor our solutions across the bank.” The company erected its own internal analytics and AI platform, which features an operational cluster to manage data ingestion, computation, storage, and model production, as well as an analytical cluster for data scientists to experiment and develop new tools before they go into production.
Intuit also has constructed its own systems for building and monitoring the immense number of ML models it has in production, including models that are customized for each of its QuickBooks software customers. Sometimes the distinctions in each model are minimal — one company might label certain types of purchases as “office supplies” while another categorizes them with the name of their office retailer of choice, for instance. The model must recognize those distinctions.
“We actually build models that are personalized to each [customer],” said Diane Chang, director of data science at Intuit. “When you look at that, each of those individual models that we built, then we’re over millions.”
Intuit had MLops systems in place before a lot of vendors sold products for managing machine learning, said Brett Hollman, Intuit’s director of engineering and product development in machine learning.
For instance, Hollman said the company built an ML feature management platform from the ground up. “A set of features can help you train a new model. If somebody generates good features on cash flow, some other person that’s doing some other cash flow thing might come along and say, ‘Oh, well, this feature set actually fits my use case.’ We're trying to promote reuse,” he said.
Open or closed
For companies that have been forced to go DIY, building these platforms themselves does not always require forging parts from raw materials. DBS has incorporated open-source tools for coding and application security purposes such as Nexus, Jenkins, Bitbucket, and Confluence to ensure the smooth integration and delivery of ML models, Gupta said.
Intuit has also used open-source tools or components sold by vendors to improve existing in-house systems or solve a particular problem, Hollman said. However, he emphasized the need to be selective about which route to take.
“A vendor may not have all the capabilities [we] need. Looking at an open-source solution and extending an open-source solution might be a better way of approaching that particular component versus going with a vendor,” he said. “If you go with a vendor, you drive their road map, you work with them and drive their road map, but you’re dependent upon their road map versus your own internal software development lifecycle.”
The age-old “build or buy” question is the wrong one to ask, said Zoe Hillenmeyer, chief commercial officer at Peak, which sells an AI decision intelligence platform and related services. When it comes to MLops, she said, “There’s a false dichotomy between build versus buy. That’s an incorrect strategy. I think that the best AI will be a build plus buy.”
If you go with a vendor, you drive their road map, you work with them and drive their road map, but you’re dependent upon their road map versus your own internal software development lifecycle.”
However, creating consistency through the ML lifecycle from model training to deployment to monitoring becomes increasingly difficult as companies cobble together open-source or vendor-built machine learning components, said John Thomas, vice president and distinguished engineer at IBM.
“The enterprise might try to force everyone to use a single development platform. The reality is most people are not there, so you have a whole bunch of different tools. People fight over it — it’s a religious thing,” Thomas said.
IBM has responded to that reality by allowing clients to use its MLops pipelines in conjunction with non-IBM technology, an approach that Thomas said is “new” for IBM.
Engineering talent crunch
Companies struggling to find suitable off-the-shelf MLops platforms are up against another major challenge, too: finding engineering talent.
Many companies do not have software engineers on staff with the level of expertise necessary to architect systems that can handle large numbers of models or accommodate millions of split-second decision requests, said Abhishek Gupta, founder and principal researcher at Montreal AI Ethics Institute and senior responsible AI leader and expert at Boston Consulting Group.
“A lot of these places that are attempting to do this are just not tech-native or tech-first companies,” BCG’s Gupta said. For one thing, smaller companies are competing for talent against big tech firms that offer higher salaries and better resources. “There is a lack of technical talent to a significant degree that hinders the implementation of scalable MLops systems because that knowledge is locked up in those tech-first firms,” he said.
Despite the obstacles, Intuit’s Hollman said it makes sense for companies that have graduated to more sophisticated ML efforts to build for themselves. “If you’re somebody that’s been in AI for a long time and has maturity in it and are doing things that are at the cutting edge of AI, then there’s [a] reason for you to have built some of your own solutions to do some of those things,” he said.
For companies with less-advanced AI operations, shopping at the existing MLops platform marketplace may be good enough, Hollman said.
“If you’re a new entrant into the machine learning space, those platforms are the best place to start. They’re going to have a soup-to-nuts experience,” he said. “Trying to build your own ML platform from scratch is a big undertaking.”