cloudcloudauthorTom KrazitCloud NewsletterAre you keeping up with the latest cloud developments? Get Tom Krazit's newsletter every Wednesday.d3d5b92349
×

Get access to Protocol

I’ve already subscribed

Will be used in accordance with our Privacy Policy

Where should we send your daily tech briefing?

×
Protocol Cloud
Your weekly guide to the future of enterprise computing.

From chaos, a greater understanding of resilience

American flag spiral

Welcome to Protocol Cloud, your comprehensive roundup of everything you need to know about the week in cloud and enterprise software. This week: Finding the meaning in chaos, the story of Kelsey Hightower, and how the BBC uses the cloud.

The Big Story

The fault in our stars

A little peek behind the Protocol Cloud scenes: I write this newsletter on Tuesday mornings, holding it open until the last second in case anything big happens, but most days, it's done and dusted by noon-ish Pacific Time. That means today's, which you are seeing on the morning of Wednesday, Nov. 4, was written on one side of a chasm.

There didn't seem to be any point in waiting to see what happened on the most pivotal U.S. Election Day of any of our lifetimes, given that the outcome might be in doubt for days or weeks to come. Likewise, there didn't seem to be any point in trying to sum up what that outcome might mean for cloud computing, a concept that I would pay real dollars to hear either man up for election yesterday describe in a sentence or two.

But there was something cloudy that popped into my head while thinking about this historic week: chaos engineering.

  • Born at Netflix, chaos engineering is the deliberate introduction of worst-case-scenario problems into cloud computing infrastructure.
  • The idea is to understand — within a controlled environment — how systems fail when they encounter stress they weren't designed for, such as a sudden outage in a cloud computing region.
  • It's taking the notion of "hoping for the best, but preparing for the worst" to its logical ends, designing your systems with durability in mind by understanding how they will react to shocks that can be impossible to predict, in hopes the whole system won't fall apart.
  • Gremlin, a startup founded by ex-Netflix and Amazon engineers, has raised $26 million in funding to help companies employ this concept in their own applications.

Modern web infrastructure is amazingly complex, and while lots of people get paid lots of money to design it in resilient ways that have been refined over years of hard-earned lessons, things still break all the time.

  • Once you accept that systems will fail, understanding failure, rather than desperately trying to prevent failure, becomes the priority.
  • And once you understand failure, reacting to failure results in a stronger system than trying to prevent the inevitable: Failure will happen and you should do everything you can to make sure it happens in a predictable way that can be dealt with without breaking the entire system.
  • "What you'd really like to do is choose between two things other than collapsing in a heap," said Adrian Cockcroft, the former Netflix engineer who helped develop the theory of chaos engineering and current vice president of cloud architecture strategy at AWS, in a speech in 2018 describing how most applications fall down.
  • One of those two things is to have apps fail gracefully, so the user understands what just happened and still has a working computer, and the other is to acknowledge that while the problem might cause a subpar app experience, 80% of its functionality is probably good enough.

This is a cultural — not technological — shift in thinking inside organizations, much as DevOps was a decade ago. Until cloud vendors, end users, and partners figure out how to make this world a little simpler and a little easier, the best way to prevent systemic catastrophe is to recognize how systems will fail.

  • There is a metaphor here.

A MESSAGE FROM MICROSOFT AZURE

Azure

Tap into nearly unlimited resources to tackle your most demanding high-performance computing (HPC) or AI challenges. Azure can help you develop your title, run it as a service, and build effective multiplayer communities with solutions designed for modern game development.

Learn more here.

This Week On Protocol

Decision time: If we know who the president of the United States will be in January when you receive this newsletter, you can find all of Protocol's Election 2020 coverage here. If we don't know who the president of the United States will be in January when you receive this newsletter, you can find all of Protocol's Election 2020 coverage here.

Getting better: Kelsey Hightower is a special person in the cloud and enterprise computing world, and he's had quite the journey from managing an Atlanta McDonald's before he could drive a car to becoming one of Google Cloud's most valuable employees. Check out our profile of his life and career before you end up working for him one day.

Cloudy dollars: During the third quarter, the giants of cloud computing continued to shrug off the economic effects of the pandemic that have wrecked so many other businesses. It's unclear how long that will last, but for now it seems like enterprise vendors that are behind on their cloud strategies are the ones suffering the most.

Five Questions For...

Mai-Lan Tomsen Bukovec, global vice president of storage, AWS

What was your first tech job?

My father was a career U.S. Foreign Service officer, and so I grew up in U.S. embassies in a number of different countries. My first tech job was data entry using a Wang computer in the Press and Cultural section of the U.S. Embassy in Beijing, China.

What's the best piece of advice you could give to someone starting their first tech job?

Be relentlessly curious about how the customer uses what you are building and understand if you are solving the problem that the customer wants you to solve.

Pick one piece of consumer or business software (that isn't sold by your company) that you can't live without.

I use my Garmin watch and heart rate monitor tracker almost every day. Conditioning matters for boxing and martial arts, which I have practiced for many years. Heart rate target zones and recovery rates help me train smarter.

What was the first computer that made you realize the power of computing and connectivity?

I joined the U.S. Peace Corps in 1994 after I graduated from college. I lived in a village in northern Mali, West Africa. It took many dusty, hot hours to make my way via public transportation from my village in the northern Mopti region to Bamako, the capital city of Mali. The contrast between living in a village with no running water and electricity, and reading email on a screen in an office in Bamako a day or two later brought home the power of connectivity in a way that I hadn't experienced as a college student with easy access to computers.

What will be the biggest challenge for cloud computing over the coming decade?

People and culture. It's resistance to change and fear of the unknown. Often, it comes down to how organizations lead through change. The big difference between organizations that talk about moving to the cloud, and those that actually do it comes down to a few simple elements: leadership commitment and organizational execution.

Around the Cloud

  • The New Stack had a nice remembrance of Dan Kohn, former executive director of the Cloud Native Computing Foundation, who died last weekend following complications from colon cancer.
  • The overall cloud infrastructure market grew by 33% in the third quarter, with revenue of $32.8 billion and market share among the Big Three holding constant.
  • SAP customers want to move to the cloud, said its CEO Christian Klein, and his company won't be "forcing them to buy more on-premise software." Which … seems like the right move.
  • JPMorgan Chase and Goldman Sachs paused all internal software updates for the first week of November to avoid any chance of introducing new problems, according to The Information.
  • Google Cloud isn't providing AI tech for a "virtual wall" under construction by the U.S. Customs and Border Patrol, its CEO Thomas Kurian told employees. The agency is testing AI services for other efforts, though.
  • Marvell bought Inphi for $8.2 billion, the latest in a series of chip deals from companies who want to get into the data center market.
  • AWS is getting its "things that don't fit at re:Invent" announcements out early this year, launching new compute services — based around Nvidia's latest GPUs and Nitro Enclaves, a service for running secure data — before the early December event.
  • Stephen O'Grady of Redmonk wrote another insightful post, this time predicting that the next wave of cloud services will be sold as more of a package rather than "sending buyers and developers alike out into a maze of aisles, burdening them with the task of picking primitives and assembling from scratch."
  • The BBC published an account of its journey to cloud services, revealing that half of the organization's web operation runs on AWS Lambda.

A MESSAGE FROM MICROSOFT AZURE

Azure

Reach more gamers globally with Microsoft Azure's 60+ announced cloud regions – more than any other cloud provider. Build, scale, and operate your game on Azure's global, secure, and reliable public cloud. Battle tested by Xbox Game Studios, trust a cloud that helps you scale as your needs change, paying only for the resources you use. Gaming runs on Azure.

Learn more here.

Thanks for reading — see you next week.

Recent Issues