The pitfalls of rapid growth
Image: Roblox

The pitfalls of rapid growth

Source Code

Good morning! This Tuesday, the metaverse broke over the weekend, Frances Haugen has thoughts about Mark Zuckerberg's future, Pinterest wants to sell you Ginsu steak knives, and you can now have crypto with a side of fries.

Building the metaverse is one thing. Scaling it is another.

Did the volume of trick-or-treaters seem heavier than usual this year? It might not be your imagination. The Great Halloween Roblox Outage of 2021 forced tens of millions of children into a rare extended vacation from the metaverse. While the outage was no less than a living nightmare for Roblox players, it also provides a useful case study on the potential pitfalls of scaling an immensely popular platform.

The Roblox outage was unusual in both its duration and the size of the affected user base. Reports of widespread outages arrived on Thursday afternoon, and it wasn't until Sunday afternoon that all systems went back online.

  • The outage likely sidelined upwards of 50 million users each day; Roblox reported 46.6 million daily active users in July 2021, which was up 8% from the previous month.
  • On a typical day that month, each active user logged nearly 2 hours and 40 minutes of playtime on the platform.

The exact cause of the outage is still unknown. Roblox hasn't yet provided a detailed report on what caused the servers to go offline, but the company plans to release one soon. In the meantime, CEO David Baszucki wrote a blog post attributing the outage to a core infrastructure system becoming overwhelmed, which was "prompted by a subtle bug in our backend service communications while under heavy load."

  • Roblox added that this failure "was caused by the growth in the number of servers in our datacenters."
  • The fix took so long in part because Roblox experienced difficulty diagnosing the bug. Once engineers were able to figure it out, they "resolve[d] the issue through performance tuning, re-configuration, and scaling back of some load."

Roblox has grown its platform at warp-speed over the past year to accommodate the surge in concurrent users on its platform.

  • The Roblox network operates in more than 180 countries. In 2020, Roblox first added multi-region core data center infrastructure applications in an effort to get rid of single points of failure in its systems.
  • And to accommodate the additional capacity requirements, Roblox also transitioned away from utilizing global pools of compute resources to instead use regional pools serving customers in either the Americas, Asia or Europe.
  • While embarking on these ambitious projects, Roblox nearly doubled its entire workforce in the course of a year. The company added 400 employees in 2020 to the 579 that started the year. Most of these individuals were hired for either product or engineering roles.

Roblox was trying to build the plane while flying it, to use a somewhat tired business metaphor. It's hard to place any blame on Roblox for attempting to capitalize on its tremendous popularity. The company didn't lack any obvious systems that could have prevented an outage.

  • As recently as April 2021, for instance, Roblox detailed its efforts to build comprehensive network monitoring and fault detection systems.
  • Rather than assigning blame to Roblox, it's more useful to highlight the inherent complexity that arises from scaling a network. Prioritizing capacity growth often comes at the expense of understanding how exactly that system works. And as the workforce grows, there's a smaller proportion of employees acquainted with the ins and outs of the platform. These dynamics help explain why it may have taken so long for Roblox to diagnose the problem.

It's also worth putting the outage into perspective. Many news outlets have pointed out that Roblox's market cap took a hit of around $1.5 billion following the outage. But Roblox says the outage did not result in a loss of player data, and the user experience has returned to normal. It also committed to making its creator community "economically whole," which is an important measure given Roblox's vision for its creator ecosystem. And if Roblox achieves its immensely ambitious goal of accruing 1 billion users, this outage will have come at a time with relatively low stakes and high learning upside.

— Hirsh Chitkara (email | twitter)

A MESSAGE FROM ALIBABA

In China, brands & retailers are connecting with the world's most digitally savvy consumers. By partnering with Alibaba, businesses can develop immersive and entertaining shopping experiences to engage customers in overseas markets. As a result, thousands of U.S. businesses are finding success reaching over 900 million consumers on Alibaba's platforms.

Learn more

People are talking

Frances Haugen thinks Mark Zuckerberg should resign:

  • "I think Facebook would be stronger with someone who is willing to focus on safety."

Jeff Bezos says it'll take a lot more than his $10 billion pledge to fight global warming:

  • "It'll take trillions of dollars to make a dent in climate change … and it's going to take nation states, it's gonna take companies, and it'll take NGOs and nonprofits as well."

The metaverse is synchronous, which Facebook's Andrew Bosworth said will help users feel safer:

  • "There are some advantages that the synchronous environment is going to have in terms of an individual being able to feel in control of themselves and their experience."

But Meta may not attract all the metaverse users it really wants, said Dave Carr, communications head at the virtual world Decentraland:

  • "People who want to determine the future of the virtual worlds they inhabit, maintain ownership of their creative output and move freely between them will choose the decentralized version."
Elon Musk said that, actually, Tesla hasn't formally agreed to that deal with Hertz:
  • "I'd like to emphasize that no contract has been signed yet. Tesla has far more demand than production, therefore we will only sell cars to Hertz for the same margin as to consumers."

Making moves

ByteDance CFO Shou Zi Chew is stepping down so he can focus full-time on his other role as TikTok CEO.

Bryan Palma is the new CEO of the McAfee Enterprise and FireEye company. Palma has worked on cybersecurity and cloud issues at companies like BlackBerry and Cisco.

Rivian hopes to be valued at $60 billion when it IPOs next week.

Want some crypto with your Whopper? Burger King and Robinhood joined forces to offer Royal Perks members crypto with a minimum $5 purchase.

Uber will ship kids products. People can now buy on-demand items like baby food and placemats through Uber Eats.

In other news

Vaccinated Amazon workers don't need to wear masks. Starting today, the company is waiving mask requirements for warehouse workers who are fully vaccinated unless state or local laws require it.

Should people go to jail for causing harm on social media? U.K. lawmakers will introduce a bill next month that would ban content that causes "emotional, psychological, or physical harm to the likely audience," and violators could get jail time.

Your iPhone could soon dial 911 automatically, sources told The Wall Street Journal. Apple is working on a feature that would allow iPhones and Apple Watches to immediately call in an emergency in the case of a car accident.

Fortnite in China is shutting down in a couple weeks. Epic didn't explain the reason for the closure, but it's been trying to comply with China's gaming rules for the past couple of years.

WeChat users say they have seen that Meta logo before. It would look exactly the same as the platform's short-video tool WeChat Channels if it were flipped. And now Chinese social media users are making fun of it.

Amazon workers in Germany went on strike yesterday. Employees are urging the company to offer better working conditions and pay, and they said they'll continue the strike for up to three days.

Pinterest is getting into live shopping. The company is rolling out a feature where influencers host themed shows promoting different products every week. It's kind of like old infomercials, only influencers and users can talk to one another.

Meet Jane Manchun Wong

Most people have to wait to learn about new apps or gadgets after they are released. But Jane Manchun Wong gets ahead of the game by using reverse-engineering software and websites to find out about products before everyone else.

Wong's Twitter account is like a treasure trove of upcoming products: She posts all of her scoops, from cosmetic changes on Twitter to new features at Spotify. When a company eventually launches the product she predicted, Wong replies to her original tweet with the official release. Follow along for all her finds on Twitter — just don't call her a Twitter employee.

A MESSAGE FROM ALIBABA

U.S. brands – big and small – are growing their global businesses by selling on Alibaba's online marketplace of over 900 million Chinese consumers. By engaging customers in immersive, virtual shopping experiences, they can directly tell their stories and connect with consumers across the world.

Learn more

Thoughts, questions, tips? Send them to sourcecode@protocol.com, or our tips line, tips@protocol.com. Enjoy your day, see you tomorrow.

Recent Issues

AWS is all grown up

Why Jack really left