Facebook more or less disappeared from the internet on Monday. It's not just that its website went down, it's that the company seemed for hours to have been deleted entirely. It was so bad that the domain facebook.com appeared to be for sale. (Jack Dorsey publicly inquired about the price.)
- This was Facebook's initial public comment: "We're aware that some people are having trouble accessing our apps and products. We're working to get things back to normal as quickly as possible, and we apologize for any inconvenience."
- Adam Mosseri took the whole thing in stride, saying "it does feel like a snow day."
- Staying home was certainly the move: The outage got so bad that employees said they couldn't badge into Facebook offices or access their email. Which of course made things worse: Employees reportedly couldn't get into buildings to fix the problem, or communicate in order to do so. They turned instead to Discord, Zoom and other platforms to keep in touch. And users who couldn't get on Facebook flocked elsewhere; Twitter and Signal both reported huge spikes in usage on Monday.
- "*Sincere* apologies to everyone impacted by outages of Facebook powered services right now," Mike Schroepfer, Facebook's CTO, tweeted — because there was no other way to communicate. "We are experiencing networking issues and teams are working as fast as possible to debug and restore as fast as possible."
The outage was the result of a fairly straightforward mistake. There were some initial rumors and conspiracy theories (coming a day after Frances Haugen outed herself as a whistleblower and a day before a Congressional hearing, how could there not be?) but the truth appears to be much more routine. It involves BGP, the tech that helps networks communicate, and Facebook did the rough equivalent of changing its number and unplugging the landline.
- "Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication," Facebook's Santosh Janardhan wrote on the company's blog Monday night. "This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt."
- The problem was simple, but the fix was complicated. "The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations," Janardhan wrote, "complicating our attempts to quickly diagnose and resolve the problem."
- Cloudflare published a detailed blog post about what happened, showing a quick burst of BGP updates just before the outage began. "With those withdrawals," Cloudflare's Tom Strickx and Celso Martinho wrote, "Facebook and its sites had effectively disconnected themselves from the Internet."
Others, though, saw bigger takeaways from the outage:
- "Facebook, WhatsApp, and Instagram all going down at the same time sure seems like an easily-understandable and publicly-popular example of why breaking up a certain monopoly into at least three pieces might not be a bad idea," Edward Snowden tweeted.
- "Today is a moment where we get a glimpse of what a modern digital terrorist attack / nation state warfare would feel like -- and the case for focusing on decentralization gets stronger," Sam Lessin said.
- Many people used the outage as a chance to appeal for a decentralized internet (and in some cases, try to sell people on their blockchain-based social app.) "Kudos to @facebook for giving us a very real demonstration of why the move to a decentralised Web 3 is necessary and, indeed, inevitable," Polkadot founder and Ethereum co-creator Gavin Wood tweeted.
- "As far as structural outages go, this FB thing is relatively minor," said Nick Merrill, a cybersecurity researcher at UC Berkeley. "a few hundreds of millions of dollars lost at most. if a similar outage hit AWS, cloudflare, akamai, etc, no one's credit card would work. i expect the losses would be in the billions if not trillions."
Suddenly seeing an internet without Facebook was an eye-opening thing, both for the businesses that rely on the company to reach customers and share information, and for the people — particularly in the developing world — who rely on it to communicate with loved ones. "WhatsApp went dark today when I was in the middle of a conversation with a close colleague in South Africa," said Will Anderson, who works in international development and land restoration, "but I didn't notice it until I stopped receiving messages from my many groups. They usually buzz at all hours, so it's eerie to not hear updates from Indian entrepreneurs or local tree-growing organizations in Niger." It was also a reminder that the internet's infrastructure is still fragile and complex and always barely holding together.
By about 5 p.m. ET, Facebook started to recover. By the end of the work day on the West coast, things were back to normal.
- At the end of it all, Mark Zuckerberg weighed in. "Facebook, Instagram, WhatsApp and Messenger are coming back online now," he posted on his newly functioning Facebook page. "Sorry for the disruption today -- I know how much you rely on our services to stay connected with the people you care about."
In the end, this was Facebook's second-worst outage ever. The top spot belongs to an outage in 2019, when a very similar incident left Facebook's services offline for more than 14 hours. (Another, even longer outage hit the company in 2008, but it was much smaller then.) At the time, Facebook said only that "a server configuration change" was to blame.
The lesson here? Don't put all your eggs in one basket. Because everything at Facebook runs on a single system, from its platforms to its status pages to its in-office security systems, breaking one thing broke everything. Because when the servers need to be reset, you don't want to rely on the servers to let you in the building. Expect Facebook to make changes, and others to make sure this doesn't happen to them next.