Same data + same chart form = same story. Maybe.
Feb 18, 2021
We love charts that tell stories.
Some people believe that if they situate the data in the right chart form, the stories reveal themselves.
Some people believe for a given dataset, there exists a best chart form that brings out the story.
An implication of these beliefs is that the story is immutable, given the dataset and the chart form.
If you use the Trifecta Checkup, you already know I don't subscribe to those ideas. That's why the Trifecta has three legs, the third is the question - which is related to the message or the story.
I came across the following chart by Statista, illustrating the growth in Covid-19 cases from the start of the pandemic to this month. The underlying data are collected by WHO and cover the entire globe. The data are grouped by regions.
The story of this chart appears to be that the world moves in lock step, with each region behaving more or less the same.
If you visit the WHO site, they show a similar chart:
On this chart, the regions at the bottom of the graph (esp. Southeast Asia in purple) clearly do not follow the same time patterns as Americas (orange) or Europe (green).
What we're witnessing is: same data, same chart form, different stories.
This is a feature, not a bug, of the stacked area chart. The story is driven largely by the order in which the pieces are stacked. In the Statista chart, the largest pieces are placed at the bottom while for WHO, the order is exactly reversed.
(There are minor differences which do not affect my argument. The WHO chart omits the "Other" category which accounts for very little. Also, the Statista chart shows the smoothed data using 7-day averaging.)
In this example, the order chosen by WHO preserves the story while the order chosen by Statista wipes it out.
What might be the underlying question of someone who makes this graph? Perhaps it is to identify the relative prevalence of Covid-19 in different regions at different stages of the pandemic.
Emphasis on the word "relative". Instead of plotting absolute number of cases, I consider plotting relative number of cases, that is to say, the proportion of cases in each region at given times.
This leads to a stacked area percentage chart.
In this side-by-side view, you see that this form is not affected by flipping the order of the regions. Both charts say the same thing: that there were two waves in Europe and the Americas that dwarfed all other regions.
I have always considered colored area charts to be pretty but difficult to interpret. Your version resolves the issue of plotting order affecting the message, it still requires the reader to make mental subtractions to determine the relative percentages (except for the lowest plated one). Would not a line chart serve better?
Posted by: Richard Krablin | Feb 18, 2021 at 11:08 AM
RK: Yes. A line chart that have the same baseline for every region is much better at comparisons. I rarely use area charts. This raises the same debate as the pie chart. Some people believe that showing parts of the whole is very important.
Posted by: Kaiser | Feb 18, 2021 at 03:14 PM
In the syllogism at the start, I think you palmed a card between your major and minor premises, and your conclusion. A better-stated conclusion would be "an implication of these beliefs is that the story is immutable, given the dataset and the best or right chart form". An implication of that is that the case here, the stacked area chart, is not the best or right chart form.
Perhaps a horizon chart would not have been as vulnerable to the story changing drastically when the order was changed slightly? As you say, the stacked area percentage chart was also not as vulnerable.
Posted by: derek | Feb 19, 2021 at 03:25 AM
btw I was confused by WHO's regions: others might like to know that "Western Pacific" includes China, while "South East Asia" includes India.
Posted by: derek | Feb 19, 2021 at 03:33 AM