Graphing highly structured data
Dec 08, 2021
The following sankey diagram appeared in my Linkedin feed the other day, and I agree with the poster that this is an excellent example.
It's an unusual use of a flow chart to show the P&L (profit and loss) statement of a business. It makes sense since these are flows of money. The graph explains how Spotify makes money - or how little profit it claims to have earned on over 2.5 billion of revenues.
What makes this chart work so well?
The first thing to notice is how they handled negative flows (costs). They turned the negative numbers into positive numbers, and encoded the signs of the numbers as colors. This doesn't come as naturally as one might think. The raw data are financial tables with revenues shown as positive numbers and costs shown as negative numbers, perhaps in parentheses. Like this:
Now, some readers are sure to have an issue with using the red-green color scheme. I suppose gray-red can be a substitute.
The second smart decision is to pare down the details. There are only four cost categories shown in the entire chart. The cost of revenue represents more than two-thirds of all revenues, and we know nothing about sub-categories of this cost.
The third feature is where the Spotify logo is placed. This directs our attention to the middle of the diagram. This is important because typically on a sankey diagram you read from left to right. Here, the starting point is really the column labeled "total Spotify revenue". The first column just splits the total revenue between subscription revenue and advertising revenue.
Putting the labels of the last column inside the flows improves readability as well.
On the whole, a job well done.
Sankey diagrams have limitations. The charts need to be simple enough to work their magic.
It's difficult to add a time element to the above chart, for example. The next question a business analyst might want to ask is how the revenue/cost/profit structure at Spotify have changed over time.
Another question a business analyst might ask is the revenue/cost/profit structure of premium vs ad-supported users. We have a third of the answer - the revenue split. Depending on relative usage, and content preference, the mix of royalties is likely not to replicate the revenue split.
Yet another business analyst might be interested in comparing Spotify's business model to a competitor. It's also not simple to handle this on a sankey diagram.
I searched for alternative charts, and when you look at what's out there, you appreciate the sankey version more.
Here is a waterfall chart, which is quite popular:
Here is a stacked column chart, rooted at zero:
Of course, someone has to make a pie chart - in this case, two pie charts: