Best practices to manage your data and optimize cloud data costs
Why cost optimization is necessary for data management in the cloud
Half of businesses struggle to keep their cloud costs under control, according to a recent survey of American IT directors and executives. Nine in 10 businesses currently host some or all of their IT infrastructure in the cloud rather than on-premises — and of those that don’t, 60% say migrating to the cloud is a top priority. By 2025, more than 85% of organizations will be cloud-first, according to Gartner.
But with that comes a whole host of associated costs for businesses, specifically when it comes to managing data in the cloud. IDC has tracked a 29% year-over-year increase in cloud services spending, and businesses spent more than $400 billion keeping their key services online and in the cloud. That’s in part because the cloud makes everything an operating expense, rather than a capital expense. Payment sizes to cloud service providers depend on the data and compute used — which can vary significantly from month to month.
Shifting to the cloud unlocks the possibility to pull in more data from more sources, pushing costs up commensurately. What were once small inefficiencies can be amplified into massive ones as data inputs multiply. The combination of exploding data, near infinite elasticity of the cloud and “pay for what you use” model can make it difficult to forecast costs. And for many, the benefits of managing data in the cloud become obsolete when resulting in unexpected costs.
Overspending is an issue more businesses face when managing data in the cloud. In fact, a recent Forrester study cites that 82% of data management decision-makers report forecasting and controlling costs as a data ecosystem challenge. Businesses can benefit from best practices shared by organizations who have faced these challenges head on.
How Capital One controlled its data costs
Capital One exited its last data center in 2020 and became the first US financial company to move fully on the public cloud. But to fully realize the benefits of operating in the cloud, Capital One needed to modernize its data ecosystem. The company adopted Snowflake as its data cloud, which gave it the ability to scale instantly for nearly any workload. However, as their access to data increased, they needed to build new data management platforms to solve data ecosystem challenges, including tackling Snowflake costs.
“When you have unlimited compute and power, you can very easily go from data starved to data drunk. We built tools to put the proper governance and cost control measures in place to make sure we were provisioning our data platform in a well-managed way,” said Patrick Barch, senior director of product management, Capital One Software. “It was also important for that experience to be self-service so analysts could easily access the data they needed while complying with policy guidelines."
But what are those best practices and tools to control costs in the face of growing data? What did Capital One learn from its own transformation, and what advice can it offer other companies experiencing similar challenges?
- Take a federated approach: The cloud allows more reactive, tailored and nimbler responses for individual teams across a company, rather than a one size fits all approach. Rather than a top-down diktat around data management, Capital One put trust and power into the hands of teams who interact with data day in, day out. This federated approach helped Capital One’s data practices operate at the speed of its business without encountering a bottleneck. It also empowered the company to give individual teams transparency into common cost drivers and the ability to navigate them on their own.
- Streamline queries: Time is money, and Capital One recognized poorly-written queries were costing both time and money. Methods of interrogating data that had worked outside the cloud suddenly drew heavily on compute, pushing up costs. The company revised those queries, recommending users preview data functionality in Snowflake rather than doing it live. At the same time, a small shift saved cash: switching what happened when a new row of data was created from generating a new file for each row to instead utilize a copy command unlocked efficiencies.
- Rightsize data warehouses: It can be tempting to use large warehouses to meet expected demand, but this can result in underutilized capacity most of the time. Capital One more accurately mapped warehouse size to meet the demand, using large warehouses only where heavy compute was required. The organization also recognized that workload was different during the evening than business hours, or that warehouses were on for 30 minutes when no queries were running, and made adjustments accordingly. For example, warehouses now turn “off” after two minutes of no queries running.
- Constant optimization: Data professionals were tasked with examining and scrutinizing where usage was leading to inefficiencies and heading them off. Training staff to continually monitor performance and costs, while making adjustments along the way, is crucial to managing costs. To give staff full cost transparency, Capital One built a cost dashboard solution that detected cost spikes and helped with forecasting against what was budgeted.
A new tool for a new problem
These were all lessons learned through practice, and enabled Capital One to think about how to help other companies experiencing similar challenges. Through its cloud and data journey, Capital One also built its own tools to solve for gaps in the market, and key among them? Capital One Slingshot, a new product from Capital One Software that helps organizations manage Snowflake data costs with alerts, recommendations and performance dashboards.
Slingshot was designed to help Capital One keep its own costs down while providing internal transparency into what was contributing to spend. Using Slingshot functionality to resize resources based on dynamic demand helped Capital One keep costs 27% lower than was projected — all while handling up to four million queries a day across 50 petabytes of data.
Tools like Slingshot can provide teams with full cost transparency amongst other benefits, but organizations can take control of their costs by implementing some of the best practices Capital One has realized. Identifying data platform inefficiencies and training staff to consistently monitor performance and cost are great methods to help bring down data costs.
Read more about cost optimization strategy and Slingshot at capitalone.com/software.