Expert Reveals Why The Facebook Outage Took So Long To Fix
Yesterday, October 4, Facebook suffered an outage that lasted for over six hours. Now, experts have revealed just why the issue took so long to fix.
Facebook has since apologised for the inconvenience. It later revealed that the outage was caused by changes to the configuration of the backbone routers, which created issues with the communication between data centres.
However, why did the issue take such a long time to fix? Experts have since weighed in on the issue and explained why the communication issue was so complicated to resolve.
Web performance and security company Cloudflare has offered a detailed explanation as to what really happened when Facebook’s sites went down, which incorporated the Domain Name System (DNS) and Border Gateway Protocol (BGP), The Guardian reports.
BGP is vital in how it shares information between autonomous systems (AS) on the internet. Without it, and the possible routes it suggests for delivering ‘network packet[s]’ to their ultimate destinations, the internet routers wouldn’t be able to function, meaning the internet wouldn’t work either, according to Cloudfare’s investigation.
All the networks that make up the internet are linked together by BGP. BGP is basically the most-experienced explorer on your hiking trip, who holds the map to the fastest route to the final location. Facebook can only exist and be visible to other networks through BGP.
Each individual network has an ASN (Autonomous System Number), which is ‘an individual network with a unified internal routing policy’. Using BGP, all ASNs have to ‘announce’ their ‘prefix routes to the internet’ so that they can be located and connected with.
According to Cloudfare, Facebook ‘stopped announcing the routes to their DNS prefixes’ at 16.58 UTC, which resulted in their DNS servers being unavailable. Without Facebook’s DNS, there was no way for users to get to Facebook as there was no map or ‘https://’ link to follow. The company runs all its platforms through Facebook, so Instagram, WhatsApp, Messenger and Workplace were affected too.
The outage also caused issues in Facebook’s own internal systems, such as offices locking staff out and its internal communications platform malfunctioning as well.
The root of the problem was hard to identify due to Facebook’s internal systems all being run through Facebook itself.
While the social media giant hasn’t detailed the full intricacies of the issue, it is reported that servers in California, where the problem is said to have originated, had to be manually reset by a technical team.
By 21.00 UTC, Cloudfare reported seeing ‘renewed BGP activity from Facebook’s network’, and as of 21:28, Facebook appeared to have re-joined the Internet on a global scale, with its DNS finally back working again.
In the future, while the outage is considered unusual, there is a chance it could occur again, though Facebook as assured no one’s personal data was put at risk by the outage.
Facebook have stressed that they will continue to investigate the issue in order to ‘make [their] infrastructure more resilient’.
If you have a story you want to tell, send it to UNILAD via [email protected]
Most Read StoriesMost Read