Enterprise

An oral history of #hugops: How tech’s first responders built a culture of empathy

When something breaks on the internet, the people who know how to fix it just want to give their colleagues a hug — even if they're a rival.

the Velocity conference in San Jose in 2010

The #hugops community in its happy place: the Velocity conference.

Image: James Duncan Davidson/O'Reilly Conferences

In almost every profession, it seems like there are two types of workers: the ones who get the glory, and the ones who do the essential work no one ever sees — unless something goes wrong.

In enterprise computing, those overlooked people are known as operations engineers. They're the ones who keep the rickety Rube Goldberg machine that is the modern internet from falling to pieces every day, while their glamorous counterparts — software developers — get to bask in the recognition that comes with shipping a new feature or creating a new service.

A little over 10 years ago, a group of operations-oriented engineers decided they were fed up with software developers who didn't care if their code actually worked, so long as it shipped. They were tired of abuse at the hands of management who forced their teams to be on call 24/7 with little to no internal support, let alone recognition.

Those engineers created the Velocity Conference in order to band together: to share their lived experiences including the intense pressure to keep Fortune 500 companies up and running, to discuss tips and tricks for navigating tricky problems and to come together as a community of people who know what it's like to be at the bottom of the food chain when everything has gone to hell.

That community sparked a revolution known as DevOps, the idea that software developers and operations professionals needed to work together more closely to support the ever-more complex task of running sophisticated software over the internet. Big companies such as Amazon and Google started to develop the operations career path with incentives and rewards parallel to those on the development side, while acknowledging that these people needed support from the highest levels of the company to do their very difficult jobs.

And out of this community came a Twitter hashtag, an in-group signal to their peers during the most stressful moments of their careers that a team had their back. When a major cloud service goes down, such as during Slack's early January outage, most people on Twitter see an opportunity to vent their frustration and score points at the affected company's expense.

At those moments, the people who know what it takes to keep these services afloat spread a hashtag: #hugops.

This is the story of the engineers who keep the cloud running, and how they created their own culture of empathy when nobody else cared.

Life of a sysadmin

Adam Jacob, CEO of The System Initiative, co-founder and former CTO of Chef: Systems administrators — a now almost basically nonexistent job title — were not the most beloved humans in the technical world. We didn't get a lot of respect.

We were sort of in the same bucket like secretaries; we had a System Administrator Appreciation Day. The people who do the stuff you don't see get appreciation days because, by definition, it means I'm not being appreciated every other day.

[One team leader] took us and my whole team, there were like 20 systems administrators, and he took us all out for beer on System Administrator Appreciation Day. And he sat down with the pitchers of beer and the first thing he said was, "Here's your guys' beer. Too bad none of you are smart enough to be engineers. Cheers."

My response to that was to just be mean to him.

Jennifer Davis, developer relations manager, Google: I don't know if you ever heard of the BOFH sysadmin? There was this mentality of like, how cruel and evil can we be to our users.

Werner Vogels, CTO, Amazon: I think sysadmins mostly came out at a time when most companies were buying software. Traditionally at those operations, [software] development is on one side. Then there's this wall, and you throw software over the wall; and you don't care anymore.

Tim O'Reilly, founder, O'Reilly Media: There were all the, effectively, software janitors who were cleaning up after them. And the software janitors were kind of going: That doesn't really work.

Jesse Robbins, a former firefighter and present-day hugger, at the Velocity conference in 2010.Image: James Duncan Davidson/O'Reilly Conferences

Kolton Andrus, co-founder and CEO, Gremlin: At Amazon, I was one of 10 people that was paged when the website went down. And I took and managed the resolution of those calls from the side of the freeway next to my motorcycle because I had to pull over, call in and handle it immediately; it couldn't wait 10 minutes until I got home.

There was an Amazon Christmas party that I was at where I got a page, I had to run out to my car, get my backpack, come into a war room, sit down and resolve an incident before going back to the party. There's a lot of work that the engineers and the ops folks do behind the scenes, a lot of thankless work to help make sure things go well and get fixed.

Nathen Harvey, developer advocate, Google: What do we celebrate in technology? We celebrate new; new features, shipping new capabilities that we're delivering to customers. And we get angry when systems fail. Basically what you're saying is: We celebrate the developers, and we recognize the operators when everything goes to shit. That's not great.

Jacob: I sat in a room early on at Chef with a bunch of video game developers that were running the U.S. operations for one of the biggest video games of all time. And their boss sat across the table from them, and to my face, in front of them, said, "My guys aren't smart enough to learn Ruby." If you just interviewed system administrators from that era, 100% of them have that story.

Jesse Robbins, founder and executive chairman of Orion Labs, former co-founder and CEO of Chef: In operations, we always missed the launch party, because we were too busy in the data center or locked in an office looking at green screens trying to support a launch. We were never there for the fun part. We were always the ones that were giving up our nights and our weekends, and we're powerless to actually improve things.

When emergencies are a day job

Andrus: The on-call training I received at every company amounted to: "Here's your pager, good luck. You're smart, you'll figure it out."

Harvey: I remember a conversation I had with Ron Vidal, who is a firefighter in the San Francisco area. And one of the things he said to me was: "A firefighter has never, in their life at work, responded to an emergency. If your house is on fire, that's an emergency for you, but for the firefighters, that's their job."

Robbins: I'm a firefighter by training, and when I joined Amazon in 2001, "master of disaster" was my title. I realized that the way that we were running operations at Amazon was fundamentally not going to scale and that we needed a process and almost a cultural overhaul.

I began turning Amazon into a fire department. I literally took the sort of incident management principles that we used in the fire service and turned that into what we call GameDays and Scale Days, using essentially the incident command system in order to support people through the various ways of thinking when the red light is on.

Davis: A lot of what operations is like encourages this heroism: You have to do everything to keep it running and just throw yourself into it. It's not sustainable work. It's not great, it's terrible, and you're celebrated when you save the day but the reality is, it's terrible. It harms your relationships, and it harms your health and just frames how you work with other people.

"Here's your pager, good luck. You're smart, you'll figure it out."

Nora Jones, founder and CEO, Jeli: We're shifting towards a kind of a time where people see issues and incidents as a symptom rather than a cause of something, and trying to understand the bigger system that is playing out in those organizations.

Robbins: I owned availability at Amazon, and when I say owned it, I was sort of a tyrant, and ran it very aggressively. There was this big outage that we had [in the early 2000s], and there was a person early in their career who was literally shaking when I walked into the room because they were so afraid of what was going to happen.

I realized, "I've got to change the way that I approach this entirely and make it safe to experiment, safe to do these other things, to not have this punitive model and approach." It was seeing that person's face where I'm like, "Oh, I'm not the fire department, I'm like a bad guy. I'm being a villain."

Davis: If we reduce the heroism, we can reduce burnout.

Robbins: There is an ethos that came from all of that early work that recognizes how it is important to be kind to each other. And part of what I did early on at Amazon was create a culture of safety. You only get to do really big, great things when you're able to take great risks safely.

A meeting of like minds

John Allspaw, founder and principal, Adaptive Capacity Labs: These topics deserved an entire conference. I guess it was less that it deserved an entire conference, but more that a few folks convinced Tim O'Reilly to actually do it.

O'Reilly: They said, "Look, we need a gathering place for our tribe." We had done that before, for these various open-source communities. A lot of these things are rooted in communities, and so if you can figure out what community you want to bring together, you start by bringing them together.

Allspaw: What [the Velocity Conference] did was important, because it was a signal that operating software and understanding how things are running and anticipating things that can go wrong could be considered distinct from software development.

Artur Bergman, co-founder and chief architect, Fastly: What we were doing was just as critical as writing the code. If you can't run the code, it has no value.

Vogels: The time to develop software is actually quite small [compared] to the time that you have to operate it. So even though you may be building something complex, it may take a year or two years [to build], you may have to operate it for many, many more years to come.

Jacob: Velocity was like the first time that there was a non-academic place where everybody who is doing that work could get together. And it was like, well-funded and pretty. It wasn't like we were meeting up in the American Legion hall or whatever. It was a fucking conference.

Allspaw: We were finding this pretty significant common ground. For many, many years, they didn't have a place to put these ideas, or even labels or terms or vocabulary to talk about the dread — or actually sort of outright terror — that can come with, "shit's broken, and we have no idea."

So there's this lived experience of, "OK, you're with your colleagues and shit's broken and you don't have 100% clarity, but you've got a couple of good ideas that look sort of fruitful. And okay, so it seems like we should connect this thing to this thing and restart this other thing? We should do it in that order. What do you think about that?" You'd see this in IRC, we didn't have Slack back then.

This conference exists because we've got this shared experience with incidents and the general challenge is not just responding to incidents, but trying to work out how to prevent the ones in the future. And it's difficult work.

Time for a hug

Jacob: I'm a very huggy person. And so I hugged all of those people [at Velocity], all the time. Because it was happening to this group of people who … their work environment was not a place where you got a fucking hug.

Davis: We're building complex systems that include the people. And so how do we handle the unpredictable stress of complex systems? When you think about hugs, hugs are used to reduce pain. They're used to show that you care and they're used to reduce fear.

Jacob: So Artur Bergman was — is? — a particularly salty dude. He swears as much as I do, maybe more, and he's Swedish, so like when he swears, it's better.

Artur is not a person who was huggy. Artur would maybe suffer a hug from me, or suffer a hug from John [Allspaw]. At some point, John made a T-shirt that is the earliest I remember of the #hugops-y thing, and on the back of it it basically says, "Hug Artur Bergman."

Bergman: [During one Velocity] I gave a keynote and then [Adam] gave a keynote where he told people to hug me, and I was not aware that he had said that. During the day around the conference, random people started coming up and hugging me, which was, you know, quite uncomfortable, especially because I had no idea why. And so I ended up hiding for the rest of the day until I finally found out at the end of the day why this was happening.

Artur Bergman, who is not the naturally huggy type, at the Velocity Conference in 2010.Image: James Duncan Davidson/O'Reilly Conferences

Jacob: It was a very special moment in time where there was this very high degree of camaraderie, there was this really high degree of familiarity.

Allspaw: Capturing this real dread, these pretty scary, pressure-filled situations, sort of fueled that you're part of this tribe. I don't know who you are, but you're here and you're talking and, so having that common ground is what I think genuinely got people [to be] like, "Can I give you a hug?"

Jacob: We knew people at all of those [big tech companies], right? And so as everybody starts to know each other, when like, Facebook would have an outage, you'd use the #hugops hashtag and you were like literally talking to your people.

Robbins: It's not a surprise that what began with a sarcastic joke to troll one of my best friends became an idea that a lot of people have rallied around because it reflects the world that they're building continuously, that they're continuously improving.

Davis: It's just a message of caring. It's a shorthand to show that I have empathy for where you're at, because I'm going to be there at some point. And I hope you show me that empathy too, but also, you know what? You are not alone.

The future according to #hugops

Jones: What we're really seeing right now is a shift in the software industry and us buttoning up and understanding that our software is quite critical. But the pressures that people are under to write this software is a lot.

Take Slack. During that outage, they had all just come back, it was the Monday that everyone came back from New Year. I can't imagine being in that office, because you're just getting used to writing code again, you're just getting used to deploying things again, and then all of a sudden, all the world is signing on to Slack at the exact same time. It makes total sense that they had an incident that day.

I think part of what we're seeing from the "learning from incidents" community is just a shift in thinking and software to say, "OK, they didn't do something wrong. Something happened that made sense for them to do what they did," and kind of allowing for that conversation to happen.

Robbins: That shift happened because we made it happen, in part because we simply made it so clear that large businesses, large organizations cannot succeed with this kind of outdated enterprise software legacy mindset. To be always on, to be always available, you're always improving, and that means dealing with failures and enabling rapid change.

I think we're in the second chapter now of a movement that has new leaders emerging and evolving. It's not a part of the MBA curriculum yet, but it soon will be.

Andrus: Inertia within an organization is hard. You can get a team of 10 to pivot quickly. You're a startup, you've got 100 people, you can change your process. You've got 10,000 engineers, it's a lot harder to get everyone to change how they've done things the last decade or two.

Harvey: The #hugops movement and the ideas behind it really speak about, "How do we build more empathy for the other humans that we interact with every day?" In my mind, it certainly goes beyond technology.

As a society, we could take some real lessons from this: How do we just have better empathy for and respect for the work and the way that people show up in the work that they do, and the fact that you know, everyone is out there doing absolutely the best that they can with what they have? I think that's really, really important.

"Random people started coming up and hugging me, which was, you know, quite uncomfortable."

Davis: Every time I hear "NoOps" or "NoDev," I'm like, "Nooooo…." Because when people are saying that the robots and automation are gonna take over, that doesn't think through all of these complexities that humans are really great at.

Yes, reducing the toil is so great. And we can have these conversations about how to balance out what availability is, and how much I'm going to spend on resolving things, and have those kinds of conversations separate from like, "We're gonna just eliminate all the humans because humans make mistakes." Humans make mistakes building the stuff that then we're relying on; we need humans as the safety checks.

Bergman: If you have a long outage, you need to care about your people and their sleep schedules, and the fact that they have to eat. And by day four or five, if you didn't do that, you're just gonna have a bunch of really tired and grumpy people who are going to make more mistakes.

Andrus: I did enjoy at Amazon and at Netflix the approach of, "You should know how your software behaves." If you've written software and deployed it and then you're turning a blind eye to it, that's just not good engineering.

Davis: What is so fascinating is that the next generation isn't putting up with this negative stuff. They're setting the expectations and they're very vocal about what they want their work environments to be like and how they want to work.

John Allspaw values the shared experience of the Velocity Conference.Photo: pinar@pinarozger.com

Jones: We need to be asking different questions and we need to give more people seats at the table. I've been at way too many organizations where the incident was just the [site reliability engineers] in the room. It should have had marketing in the room, it should have had PR in the room, it should have had customer service in the room, it should have had leadership in the room. But it's thought of as kind of an SRE issue, like SREs have to prepare for any type of situation that gets thrown their way.

I was at one organization a while back where we launched a Super Bowl commercial. And we had some bumps when we launched the commercial, but the SRE team didn't get a ton of notice that the commercial was happening, I think it was either same-day notice or a couple days beforehand, and that was not really mentioned in the post-incident review.

Andrus: The flip side of #hugops is I do think there is responsibility that should be held to the leadership of those companies. We're empathetic to the engineers that are dealing with the situation they have, but in part that's because leadership isn't prioritizing their actions, or resilience and reliability in the same way that they prioritize some of their product efforts.

Allspaw: As my colleague Dr. Richard Cook has said, we shouldn't be surprised that these systems go down. We should be more surprised that they stay up as often as they do.

Bergman: We took a job that was critical to running the world's largest websites and the internet, that was kind of under-appreciated, and turned it into a movement, modernized it with DevOps, and gave those individuals career paths.

Jacob: Who gets credit when you see a beautiful car? You don't give credit to the mechanics. You're like, "Man, those guys at Porsche really make beautiful cars." You might know, like, one legendary mechanic in the history of great mechanics.

But that's why it's so persistent: because the mechanics know the mechanics.

Enterprise

How I decided to leave the US and pursue a tech career in Europe

Melissa Di Donato moved to Europe to broaden her technology experience with a different market perspective. She planned to stay two years. Seventeen years later, she remains in London as CEO of Suse.

“It was a hard go for me in the beginning. I was entering inside of a company that had been very traditional in a sense.”

Photo: Suse

Click banner image for more How I decided seriesA native New Yorker, Melissa Di Donato made a life-changing decision back in 2005 when she packed up for Europe to further her career in technology. Then with IBM, she made London her new home base.

Today, Di Donato is CEO of Germany’s Suse, now a 30-year-old, open-source enterprise software company that specializes in Linux operating systems, container management, storage, and edge computing. As the company’s first female leader, she has led Suse through the coronavirus pandemic, a 2021 IPO on the Frankfurt Stock Exchange, and the acquisitions of Kubernetes management startup Rancher Labs and container security company NeuVector.

Keep Reading Show less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Sponsored Content

Great products are built on strong patents

Experts say robust intellectual property protection is essential to ensure the long-term R&D required to innovate and maintain America's technology leadership.

Every great tech product that you rely on each day, from the smartphone in your pocket to your music streaming service and navigational system in the car, shares one important thing: part of its innovative design is protected by intellectual property (IP) laws.

From 5G to artificial intelligence, IP protection offers a powerful incentive for researchers to create ground-breaking products, and governmental leaders say its protection is an essential part of maintaining US technology leadership. To quote Secretary of Commerce Gina Raimondo: "intellectual property protection is vital for American innovation and entrepreneurship.”

Keep Reading Show less
James Daly
James Daly has a deep knowledge of creating brand voice identity, including understanding various audiences and targeting messaging accordingly. He enjoys commissioning, editing, writing, and business development, particularly in launching new ventures and building passionate audiences. Daly has led teams large and small to multiple awards and quantifiable success through a strategy built on teamwork, passion, fact-checking, intelligence, analytics, and audience growth while meeting budget goals and production deadlines in fast-paced environments. Daly is the Editorial Director of 2030 Media and a contributor at Wired.
Enterprise

UiPath had a rocky few years. Rob Enslin wants to turn it around.

Protocol caught up with Enslin, named earlier this year as UiPath’s co-CEO, to discuss why he left Google Cloud, the untapped potential of robotic-process automation, and how he plans to lead alongside founder Daniel Dines.

Rob Enslin, UiPath's co-CEO, chats with Protocol about the company's future.

Photo: UiPath

UiPath has had a shaky history.

The company, which helps companies automate business processes, went public in 2021 at a valuation of more than $30 billion, but now the company’s market capitalization is only around $7 billion. To add insult to injury, UiPath laid off 5% of its staff in June and then lowered its full-year guidance for fiscal year 2023 just months later, tanking its stock by 15%.

Keep Reading Show less
Aisha Counts

Aisha Counts (@aishacounts) is a reporter at Protocol covering enterprise software. Formerly, she was a management consultant for EY. She's based in Los Angeles and can be reached at acounts@protocol.com.

Workplace

Figma CPO: We can do more with Adobe

Yuhki Yamashita thinks Figma might tackle video or 3D objects someday.

Figman CPO Yuhki Yamashita told Protocol about Adobe's acquisition of the company.

Photo: Figma

Figma CPO Yuhki Yamashita’s first design gig was at The Harvard Crimson, waiting for writers to file their stories so he could lay them out in Adobe InDesign. Given his interest in computer science, pursuing UX design became the clear move. He worked on Outlook at Microsoft, YouTube at Google, and user experience at Uber, where he was a very early user of Figma. In 2019, he became a VP of product at Figma; this past June, he became CPO.

“Design has been really near and dear to my heart, which is why when this opportunity came along to join Figma and rethink design, it was such an obvious opportunity,” Yamashita said.

Keep Reading Show less
Lizzy Lawrence

Lizzy Lawrence ( @LizzyLaw_) is a reporter at Protocol, covering tools and productivity in the workplace. She's a recent graduate of the University of Michigan, where she studied sociology and international studies. She served as editor in chief of The Michigan Daily, her school's independent newspaper. She's based in D.C., and can be reached at llawrence@protocol.com.

Climate

Microsoft lays out its climate advocacy goals

The tech giant has staked out exactly what kind of policies it will support to decarbonize the world and clean up the grid.

Microsoft published two briefs explaining what new climate policies it will advocate for.

Photo by Jeremy Bezanger on Unsplash

The tech industry has no shortage of climate goals, but they’ll be very hard to achieve without the help of sound public policy.

Microsoft published two new briefs on Sept. 22 explaining what policies it will advocate for in the realm of reducing carbon and cleaning up the grid. With policymakers in the U.S. and around the world beginning to weigh more stringent climate policies (or in the U.S.’s case, any serious climate policies at all), the briefs will offer a measuring stick for whether Microsoft is living up to its ideals.

Keep Reading Show less
Brian Kahn

Brian ( @blkahn) is Protocol's climate editor. Previously, he was the managing editor and founding senior writer at Earther, Gizmodo's climate site, where he covered everything from the weather to Big Oil's influence on politics. He also reported for Climate Central and the Wall Street Journal. In the even more distant past, he led sleigh rides to visit a herd of 7,000 elk and boat tours on the deepest lake in the U.S.

Latest Stories
Bulletins