Enterprise

How Amazon’s S3 jumpstarted the cloud revolution

Amazon's first real web service brought us everything from Pinterest to coronavirus vaccines. Fifteen years later, insiders tell Protocol how it grew to store more than 100 trillion objects.

The Amazon Spheres are part of Amazon's Seattle headquarters.

The Spheres at Amazon headquarters are an architectural instantiation of the cloud.

Photo: MissMushroom/Unsplash

In late 2005, Don Alvarez was just another software entrepreneur struggling to get a new business off the ground when a friend working at Amazon invited him to check out a secret project that would change the world.

Alvarez's startup, FilmmakerLive, was designing online collaboration applications for creative professionals and faced a common problem for that time: storage. Tech startups were just starting to claw their way back from the excesses of the dot-com era, and buying expensive hardware was a risky bet for a startup. Buy too little and your site crashes. Buy too much and you go broke. For the chaotic life of a startup, that was a risky bet.

He was skeptical about what he could learn about movie collaboration from an ecommerce company, but took the friend up on his offer.

"Rudy Valdez blew my mind," Alvarez told Protocol. Valdez was then the head of business development for AWS, which at that time offered only a handful of basic services. He gave Alvarez, now director of engineering for Mural, a taste of Amazon's first and arguably most fundamental product: S3, a cloud-based object storage service.

S3, or Simple Storage Service, made its debut 15 years ago this weekend. It would be years before "the cloud" became one of the most disruptive forces in the history of enterprise computing. Amazon didn't even use the term when it announced S3 on March 14, 2006. But the storage service's launch instantly solved some very tricky problems for entrepreneurs like Alvarez, and would come to change the way all businesses thought about buying information technology.

Startups like Pinterest, Airbnb and and Stripe flocked to AWS in the coming years, and older companies like Netflix — then a DVD-mailing operation — also took the plunge to retool their operations for the internet.

"Amazon was putting infinite disk space in the hands of every startup at an incredibly low and pay-for-what-you-need price point, there was nothing like that," Alvarez said. "The second piece was that their API was so simple that i could just pick it up and build something useful in it, in the first 24 hours of using an unreleased, unannounced product."

S3 is now a vital cog in the AWS machine, which generated more than $45 billion in revenue last year. It has evolved in many different directions over the last 15 years, yet has kept a set of design principles drawn up by a team led by Allan Vermeulen, Amazon's chief technical officer during the earliest days of AWS, at the heart of its strategy.

"We knew what [customers] wanted to do then," Mai-Lan Tomsen Bukovec, vice president for AWS Storage and the current head of S3, told Protocol. "But we also knew that applications would evolve, because our customers are incredibly innovative, and what they're doing out there in all the different industries is going to change every year."

Mai-Lan Tomsen Bukovec runs Amazon S3 and AWS Storage. Mai-Lan Tomsen Bukovec runs Amazon S3 and AWS Storage.Photo: Amazon Web Services

Building for flexibility

"When people think bigger and faster in computers, they think of this," said Vermeulen during an interview in 2014, drawing a line in the air up and to the right. But storage technology has evolved differently, he said, over a period of long plateaus followed by sharp increases in capabilities: "It's the difference between driving my Tesla and flying my airplane."

S3 was one of those sharp breaks from the status quo. It was a godsend for developers like Alvarez, who no longer had to worry about buying and maintaining pricey storage hardware just to do business.

"There was nothing that we had access to that provided anything remotely like what S3 could do," Alvarez said. "I felt like somebody had just given me the keys to the candy store."

Like much of AWS, S3 was born from Amazon's experience building and scaling Amazon.com, which taught it a lot of hard lessons about the limits and possibilities of distributed computing.

"A forcing function for the design was that a single Amazon S3 distributed system must support the needs of both internal Amazon applications and external developers of any application. This means that it must be fast and reliable enough to run Amazon.com's websites, while flexible enough that any developer can use it for any data storage need," AWS said in the original launch press release for S3 in 2006.

In the early days of the cloud, performance and reliability were a huge concern. And those concerns were especially fraught when it came to data, which even 15 years ago was understood to be one of the most important assets in a company's arsenal.

"When we launched S3 15 years ago, S3 had eight microservices, and we have well over 300 now." Tomsen Bukovec said, referring to the then-novel software development practice of breaking up large chunks of interdependent code into smaller, independent services.

Building around microservices allowed AWS to decentralize points of failure for S3 while also creating a system designed to acknowledge that distributed cloud services will fail on occasion, and that such failures shouldn't take the entire system down.

It also allowed the company to layer on future enhancements without having to disturb the core pieces of the system: AWS now claims that S3 offers "11 9s" of reliability, or an astonishing 99.999999999% uptime that exceeds self-managed storage equipment by a large margin. (Other cloud storage vendors have matched this standard.)

S3 began life as a holding pen for simple web elements like images and video that website operators would pull down from AWS to your browser when you loaded a page. Over time, as companies became more comfortable with cloud storage, they started putting all kinds of data in S3.

And that's when things started to get a little messy.

Amazon Web Services's booth at the Microsoft PDC event in Los Angeles in 2008.Photo: D. Begley/Flickr

Plugging leaky buckets

If you look back at any number of security incidents over the past several years, a large number of them can be attributed to "leaky buckets," referring to the core unit of S3 storage. These incidents happen to other cloud providers as well, but given AWS's market share it's a problem the company has had to deal with on many, many occasions.

AWS operates under a "shared responsibility" model for security: AWS will prevent anyone from physically accessing its servers or infiltrating its network, but customers are expected to protect their accounts to a reasonable extent. In other words, you can't blame the rental car company if someone steals your laptop from the back seat of an unlocked vehicle.

Yet time and time again, cloud customers have left sensitive data belonging to their own customers in unprotected storage buckets open to anyone who can find them, which is easier than you might think. It's just one example of how AWS has had to evolve some of its core products to meet customers where they are, especially later-arriving customers accustomed to accessing everything they need from private, internal networks.

"In a business application world, you don't need to have access outside the company, or really outside a group of users within the business," Tomsen Bukovec said. But it was clear that AWS needed to do more to help its customers help themselves, which led to the development of tools like Block Public Access that could lock down all storage buckets associated with a corporate account.

It was also clear to outsiders in the fast-growth early days of AWS that Amazon's famous "two-pizza teams" were "both a strength and a weakness," Alvarez said.

"It enabled every one of those services to rocket forward at a speed none of those competitors could match. And in the early days, it meant there was a lot less consistency [and] that was hard to puzzle through and manage," he said, noting that the experience has improved over time.

Additional security tools have followed that let customers scan their accounts for unauthorized access from the public internet, or assign different levels of access to people with different roles within a company.

"Where we're seeing customers go with their migrations is that they often have hundreds of buckets and lots and lots of [different] roles," Tomsen Bukovec said of the newcomers to the cloud who seem most prone to these mistakes. "When we think about what to build to help customers secure the perimeter of their AWS resource, we think about how they would like to audit and how they would like to control" access to their storage resources inside S3.


Hospitalman Cierrajaye Santella, assigned to Naval Hospital Bremerton and Navy Medicine Readiness and Training Command Bremerton, prepares to administer the Moderna coronavirus vaccine. Moderna used AWS in the COVID-19 vaccine's development.Photo: U.S. Navy

Getting to 100 trillion

S3 continued to evolve in the years following its debut, and it also got a lot cheaper: By the time AWS got around to having its first major re:Invent developer conference in 2012, one of the major announcements from that week was a 24% to 28% percent reduction in S3 storage prices, the 24th such price cut the company had made up to that point.

Those price cuts were possible because AWS was able to upgrade the underlying S3 service on the fly, as Alyssa Henry, then vice president of AWS Storage Services, explained during a keynote address in 2012.

S3 was originally designed to hold 20 billion objects in storage, but it grew more quickly than anyone had anticipated, hitting 9 billion objects within the first year. The company upgraded the underlying storage service with more capacity in mind without any disruption to the original S3 customers, and By 2012 it had scaled to 1 trillion objects in storage, and by 2020, 100 trillion.

"What's really cool about this is customers didn't have to do anything: You didn't have to go out buy the next upgrade — v2 of Amazon S3; you didn't have to do the migration yourself; you just got it all for free, it just worked, things just got better," Henry, who is now executive vice president and head of Square's Seller unit, said at the 2012 event. "That's one of the differences with the cloud versus how traditional IT has been done."

A similar upgrade rolled out just last year, when AWS introduced strong consistency across S3.

Consistency is a data-storage concept that can rattle your brain a bit the first time it shows up; older storage systems such as the original S3 were designed around "eventual consistency," meaning that a storage service wouldn't always be able to tell you right away if a new piece of data had settled into its designated storage bucket, but it would catch up before long.

Now that modern applications move much faster, however, anything that makes a query to a storage service really needs to know the exact, current list of available data to perform at the expected level. So over the last couple of years, AWS rebuilt S3 around strong consistency principles, which other cloud providers offer but were able to roll out against a much smaller user base.

"That is a very complicated engineering problem," Tomsen Bukovec said, and it was one of the stand-out announcements from the re:Invent 2020 among the geekier set of AWS users.

As they head into a new decade, Tomsen Bukovec and her team are looking at ways to make it easier to do machine learning on top of S3 data, and to improve the performance and capabilities of data lakes that allow for fine-grained analysis of internal and customer data among AWS users.

In fact, the Moderna vaccine for COVID-19 was developed with the help of a S3 data lake, Tomsen Bukovec said.

"We have this unique view that we built up over 15 years of usage, where we can determine what our customers are trying to do, and how we can build [S3] in such a way that it keeps true to that simple, cost-effective, secure, durable, reliable and highly-performant storage," she said.

Policy

Musk’s texts reveal what tech’s most powerful people really want

From Jack Dorsey to Joe Rogan, Musk’s texts are chock-full of überpowerful people, bending a knee to Twitter’s once and (still maybe?) future king.

“Maybe Oprah would be interested in joining the Twitter board if my bid succeeds,” one text reads.

Photo illustration: Patrick Pleul/picture alliance via Getty Images; Protocol

Elon Musk’s text inbox is a rarefied space. It’s a place where tech’s wealthiest casually commit to spending billions of dollars with little more than a thumbs-up emoji and trade tips on how to rewrite the rules for how hundreds of millions of people around the world communicate.

Now, Musk’s ongoing legal battle with Twitter is giving the rest of us a fleeting glimpse into that world. The collection of Musk’s private texts that was made public this week is chock-full of tech power brokers. While the messages are meant to reveal something about Musk’s motivations — and they do — they also say a lot about how things get done and deals get made among some of the most powerful people in the world.

Keep Reading Show less
Issie Lapowsky

Issie Lapowsky ( @issielapowsky) is Protocol's chief correspondent, covering the intersection of technology, politics, and national affairs. She also oversees Protocol's fellowship program. Previously, she was a senior writer at Wired, where she covered the 2016 election and the Facebook beat in its aftermath. Prior to that, Issie worked as a staff writer for Inc. magazine, writing about small business and entrepreneurship. She has also worked as an on-air contributor for CBS News and taught a graduate-level course at New York University's Center for Publishing on how tech giants have affected publishing.

Sponsored Content

Great products are built on strong patents

Experts say robust intellectual property protection is essential to ensure the long-term R&D required to innovate and maintain America's technology leadership.

Every great tech product that you rely on each day, from the smartphone in your pocket to your music streaming service and navigational system in the car, shares one important thing: part of its innovative design is protected by intellectual property (IP) laws.

From 5G to artificial intelligence, IP protection offers a powerful incentive for researchers to create ground-breaking products, and governmental leaders say its protection is an essential part of maintaining US technology leadership. To quote Secretary of Commerce Gina Raimondo: "intellectual property protection is vital for American innovation and entrepreneurship.”

Keep Reading Show less
James Daly
James Daly has a deep knowledge of creating brand voice identity, including understanding various audiences and targeting messaging accordingly. He enjoys commissioning, editing, writing, and business development, particularly in launching new ventures and building passionate audiences. Daly has led teams large and small to multiple awards and quantifiable success through a strategy built on teamwork, passion, fact-checking, intelligence, analytics, and audience growth while meeting budget goals and production deadlines in fast-paced environments. Daly is the Editorial Director of 2030 Media and a contributor at Wired.
Fintech

Circle’s CEO: This is not the time to ‘go crazy’

Jeremy Allaire is leading the stablecoin powerhouse in a time of heightened regulation.

“It’s a complex environment. So every CEO and every board has to be a little bit cautious, because there’s a lot of uncertainty,” Circle CEO Jeremy Allaire told Protocol at Converge22.

Photo: Circle

Sitting solo on a San Francisco stage, Circle CEO Jeremy Allaire asked tennis superstar Serena Williams what it’s like to face “unrelenting skepticism.”

“What do you do when someone says you can’t do this?” Allaire asked the athlete turned VC, who was beaming into Circle’s Converge22 convention by video.

Keep Reading Show less
Benjamin Pimentel

Benjamin Pimentel ( @benpimentel) covers crypto and fintech from San Francisco. He has reported on many of the biggest tech stories over the past 20 years for the San Francisco Chronicle, Dow Jones MarketWatch and Business Insider, from the dot-com crash, the rise of cloud computing, social networking and AI to the impact of the Great Recession and the COVID crisis on Silicon Valley and beyond. He can be reached at bpimentel@protocol.com or via Google Voice at (925) 307-9342.

Enterprise

Is Salesforce still a growth company? Investors are skeptical

Salesforce is betting that customer data platform Genie and new Slack features can push the company to $50 billion in revenue by 2026. But investors are skeptical about the company’s ability to deliver.

Photo: Marlena Sloss/Bloomberg via Getty Images

Salesforce has long been enterprise tech’s golden child. The company said everything customers wanted to hear and did everything investors wanted to see: It produced robust, consistent growth from groundbreaking products combined with an aggressive M&A strategy and a cherished culture, all operating under the helm of a bombastic, but respected, CEO and team of well-coiffed executives.

Dreamforce is the embodiment of that success. Every year, alongside frustrating San Francisco residents, the over-the-top celebration serves as a battle cry to the enterprise software industry, reminding everyone that Marc Benioff’s mighty fiefdom is poised to expand even deeper into your corporate IT stack.

Keep Reading Show less
Joe Williams

Joe Williams is a writer-at-large at Protocol. He previously covered enterprise software for Protocol, Bloomberg and Business Insider. Joe can be reached at JoeWilliams@Protocol.com. To share information confidentially, he can also be contacted on a non-work device via Signal (+1-309-265-6120) or JPW53189@protonmail.com.

Policy

The US and EU are splitting on tech policy. That’s putting the web at risk.

A conversation with Cédric O, the former French minister of state for digital.

“With the difficulty of the U.S. in finding political agreement or political basis to legislate more, we are facing a risk of decoupling in the long term between the EU and the U.S.”

Photo: David Paul Morris/Bloomberg via Getty Images

Cédric O, France’s former minister of state for digital, has been an advocate of Europe’s approach to tech and at the forefront of the continent’s relations with U.S. giants. Protocol caught up with O last week at a conference in New York focusing on social media’s negative effects on society and the possibilities of blockchain-based protocols for alternative networks.

O said watching the U.S. lag in tech policy — even as some states pass their own measures and federal bills gain momentum — has made him worry about the EU and U.S. decoupling. While not as drastic as a disentangling of economic fortunes between the West and China, such a divergence, as O describes it, could still make it functionally impossible for companies to serve users on both sides of the Atlantic with the same product.

Keep Reading Show less
Ben Brody

Ben Brody (@ BenBrodyDC) is a senior reporter at Protocol focusing on how Congress, courts and agencies affect the online world we live in. He formerly covered tech policy and lobbying (including antitrust, Section 230 and privacy) at Bloomberg News, where he previously reported on the influence industry, government ethics and the 2016 presidential election. Before that, Ben covered business news at CNNMoney and AdAge, and all manner of stories in and around New York. He still loves appearing on the New York news radio he grew up with.

Latest Stories
Bulletins