Spotify Outage — March 8, 2022

Sukhad Anand
2 min readSep 11, 2022

--

Understanding service mesh.

Photo by Alexander Shatov on Unsplash

On March 8, 2022, Spotify faced an outage and Spotify went down for 2 hours. Let’s see what happened and what concepts we can learn from this outage.

1. Spotify’s architecture is built on a lot of different microservices. These microservices are for different purposes. A microservice for the artist. Another microservice for the songs.

2. Each of these microservice can be deployed on a different machine or even two microservices on the same machine. Now, how do these microservices interact with each other?

3. One way to interact can be to interact using the DNS address of the service. Exactly the way, we interact with websites is when we put an address in the address bar like “www.spotify.com”. After DNS resolution, we are returned an IP address that we can use to call the required API.

4. But, this takes a lot of time if a single operation requires multiple API calls from different microservices internally. For example, to display the details of a song, Spotify has to call the image service to get the song image, the artist service to get the artist details, and then the song service to get the song details.

5. Now, if we go by the DNS route, it will take a lot of time. To prevent this, there comes a technique called service mesh which makes use of the sidecar pattern to allow connections between the microservices directly without writing any extra code.

6. A sidecar instance runs beside each microservice which contains the mapping of addresses of other microservices and allows the current microservice to connect with other microservices. This sidecar container is reusable, so no new code is required when onboarding a new microservice.

7. All these containers are connected to the control plane which manages all the mappings and updates all the sidecars whenever there is some change.

8. Now, in the case of Spotify, this control plane went down preventing the microservices to connect with each other.
As a solution, Spotify switched connections through DNS based approach for microservice communication.

To understand the concept in detail. You can watch this video: https://lnkd.in/g6VhBxun

#microservices #architecture #softwaredevelopment #spotify #systemdesign #communication #dns

--

--

Sukhad Anand
Sukhad Anand

Written by Sukhad Anand

Addicted to 007 movies and music of all genres and all generations. A bit of philosophy with a pinch of music and a handful of coding.

No responses yet