Cloudflare Outage — June 2022

It brought down 80% of the content serving websites.

According to some reports, Cloudflare Is Used by 79.9% of All Websites That Rely on Content Delivery Networks. So, if Cloudflare goes down, you can imagine a number of websites serving images/videos to also not loading properly. And, on June 21, 2022, it happened. Cloudflare went down(for some time obviously). What happened? Let’s find out..

1. Cloudflare had been working to convert all of its busiest locations to a more flexible and resilient architecture. They designed an important part of the architecture as a Clos network. This is an added layer of routing that kind of creates a mesh of connections. This allows them to enable/disable parts of the internal network. This helps them to easily run maintenance. In the image below by the official Cloudflare blog, the spines represent this new network mesh.

2. More about that later, let’s first understand how Cloudflare makes it visible to other networks. It uses a protocol BGP: Border Gateway Protocol. Let’s understand how this protocol works.

3. BGP is a protocol that is used to determine the best path for a packet to travel from one location to another. When someone submits data via the Internet, BGP is responsible for looking at all of the available paths that data could travel and picking the best route. It finds a path for a packet to travel from across different backbones owned by different Network Service Providers(NSPs).

4. These are different practices like BGP hijacking, which does not always happen accidentally. In April 2018, attackers deliberately created bad BGP routes to redirect traffic that was meant for Amazon’s DNS service. The attackers were able to steal over $100,000 worth of cryptocurrency by redirecting the traffic to themselves.

5. As part of this protocol, operators define policies that decide which prefixes (a collection of adjacent IP addresses) are advertised to peers (the other networks they connect to), or accepted from peers.

6. These policies have individual components, which are evaluated sequentially. The end result is that any given prefixes will either be advertised or not advertised. A change in policy can mean a previously advertised prefix is no longer advertised, known as being “withdrawn”, and those IP addresses will no longer be reachable on the Internet.

7. While deploying a change to their prefix advertisement policies, a re-ordering of terms caused us to withdraw a critical subset of prefixes.

Failures in such BGP configurations have caused websites like Facebook also to go down in the past.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sukhad Anand

Sukhad Anand

1K Followers

Addicted to 007 movies and music of all genres and all generations. A bit of philosophy with a pinch of music and a handful of coding.