Handling partial data on graphics
Jan 14, 2021
Last week, I posted on the book blog a piece about excess deaths and accelerated deaths (link). That whole piece is about how certain types of analysis have to be executed at certain moments of time. The same analysis done at the wrong time yields the wrong conclusions.
Here is a good example of what I'm talking about. This is a graph of U.S. monthly deaths from Covid-19 during the entire pandemic. The chart is from the COVID Tracking Project, although I pulled it down from my Twitter feed.
There is nothing majorly wrong with this column chart (I'd remove the axis labels). But there is a big problem. Are we seeing a boomerang of deaths from November to December to January?
Not really. This trend is there only because the chart is generated on January 12. The last column contains 12 days while the prior two columns contain 30-31 days.
The Trifecta Checkup picks up this problem. What the visual is showing isn't what the data are saying. I'd call this a Type D chart.
***
What to fix this?
One solution is to present partial data for all the other columns, so that the readers can compare the January column to the others.
One critique of this is the potential seasonality. The first 38% (12 out of 31) of a month may not be comparable across months. A further seasonal adjustment makes this better - if we decide the benefits outweight the complexity.
Another solution is to project the full-month tally.
The critique here is the accuracy of the projection.
But the point is that not making the adjustment would be worse.
There also may be some complications due to the delay in deaths being notified. All deaths for December will be recorded by 12 January, but probably not all the deaths from 1-12 January. This can be complex to model.
Posted by: Ken | Jan 31, 2021 at 03:40 AM