Reader Lucia G. sent me this chart, from Ars Technica's FAQ about the coronavirus:
She notices something wrong with the axis.
The designer took the advice not to make a dual axis, but didn't realize that the two metrics are not measured on the same scale even though both are expressed as percentages.
The blue bars, labeled "cases", is a distribution of cases by age group. The sum of the blue bars should be 100 percent.
The orange bars show fatality rates by age group. Each orange bar's rate is based on the number of cases in that age group. The sum of the orange bars will not add to 100 percent.
In general, the rates will have much lower values than the proportions. At least that should be the case for viruses that are not extremely fatal.
This is what the 80 and over section looks like.
It is true that fatality rate (orange) is particularly high for the elderly while this age group accounts for less than 5 percent of total cases (blue). However, the cases that are fatal, which inhabit the orange bar, must be a subset of the total cases for 80 and over, which are shown in the blue bar. Conceptually, the orange bar should be contained inside the blue bar. So, it's counter-intuitive that the blue bar is so much shorter than the orange bar.
The following chart fixes this issue. It reveals the structure of the data, Total cases are separated by age group, then within each age group, a proportion of the cases are fatal.
This chart also shows that most patients recover in every age group. (This is only approximately true as some of the cases may not have been discharged yet.)
This confusion of rates and proportions reminds me of something about exit polls I just wrote about the other day on the sister blog.
When the media make statements about trends in voter turnout rate in the primary elections, e.g. when they assert that youth turnout has not increased, their evidence is from exit polls, which can measure only the distribution of voters by age group. Exit polls do not and cannot measure the turnout rate, which is the proportion of registered (or eligible) voters in the specific age group who voted.
Like the coronavirus data, the scales of these two metrics are different even though they are both percentages: the turnout rate is typically a number between 30 and 70 percent, and summing the rates across all age groups will exceed 100 percent many times over. Summing the proportions of voters across all age groups should be 100 percent, and no more.
Changes in the proportion of voters aged 18-29 and changes in the turnout rate of people aged 18-29 are not the same thing. The former is affected by the turnout of all age groups while the latter is a clean metric affected only by 18 to 29-years-old.
Basically, ignore pundits who use exit polls to comment on turnout trends. No matter how many times they repeat their nonsense, proportions and rates are not to be confused. Which means, ignore comments on turnout trends because the only data they've got come from exit polls which don't measure rates.
P.S. Here is some further explanation of my chart, as a response to a question from Enrico B. on Twitter.
The chart can be thought of as two distributions, one for cases (gray) and one for deaths (red). Like this:
The side-by-side version removes the direct visualization of the fatality rate within each age group. To understand fatality rate requires someone to do math in their head. Readers can qualitatively assess that for the 80 and over, they accounted for 3 percent of cases but also about 21 percent of deaths. People aged 70 to 79 however accounted for 9 percent of cases but 30 percent of deaths, etc.
What I did was to scale the distribution of deaths so that they can be compared to the cases. It's like fitting the red distribution inside the gray distribution. Within each age group, the proportion of red against the length of the bar is the fatality rate.
For every 100 cases regardless of age, 3 cases are for people aged 80 and over within which 0.5 are fatal (red).
So, the axis labels are correct. The values are proportions of total cases, although as the designer of the chart, I hope people are paying attention more to the proportion of red, as opposed to the units.
What might strike people as odd is that the biggest red bar does not appear against 80 and above. We might believe it's deadlier the older you are. That's because on an absolute scale, more people aged 70-79 died than those 80 and above. The absolute deaths is the product of the proportion of cases and the fatality rate. That's really a different story from the usual plot of fatality rates by age group. In those charts, we "control" for the prevalence of cases. If every age group were infected in the same frequency, then COVID-19 does kill more 80 and over.