A few weeks ago, I replied to a tweet by someone who was angered by the amount of bad graphics about coronavirus. I take a glass-half-full viewpoint: it's actually heart-warming for dataviz designers to realize that their graphics are being read! When someone critiques your work, it is proof that they cared enough to look at it. Worse is when you publish something, and no one reacts to it.
That said, I just wasted half an hour trying to get into the head of the person who made the following:
Longtime reader Chris P. forwarded this tweet to me, and I saw that Andrew Gelman got sent this one, too.
The chart looked harmless until you check out the vertical axis labels. It's... um... the most unusual. The best way to interpret what the designer did is to break up the chart into three components. Like this:
The big mystery is why the designer spent the time and energy to make this mischief.
The usual suspect is fake news. The clearest sign of malintent is the huge size of the dots. Each dot spans almost the entirety of the space between gridlines.
But there is almost no fake news here. The overall trend line is intact despite the attempted distortion. The following is a superposition of an unmanipulated line (yellow) on top of the manipulated:
The next guess is incompetence. The evidence against this view is the amount of energy required to execute these changes. In Excel, it takes a lot of work. It's easier to do this in R or any programming languages with which you can design your own axis.
Even for the R coders, the easy part is to replicate the design, but the hard part is to come up with the concept in the first place!
You can't just stumble onto a design like this. So I am not convinced the designer is an idiot.
How much work? You have to create three separate charts, with three carefully chosen vertical scales, and then clip, merge, and sew the seam. The weirdest bit is throwing away three of the twelve axis labels and writing in three fake numbers.
Here's the recipe: (if the gif doesn't load automatically, click on it)
Help me readers! I'm stumped. Why oh why did someone make this? What is the point?
P.S. [4/9/2020] A conversation with Carlos on Andrew's blog reveals another issue. I pointed out that the "Total cases" printed up top was not the sum of the 15 numbers on the chart. There was a gap of 184 cases. Carlos sent me a link showing a day on which the total cases in Colorado was 183 cases. I didn't quite get the point initially. He explained that it's 183 existing cases prior to the start of the period of this chart, plus the new cases during this period, leading to the "Total cases" as of the end of the period of this chart.
So, another mystery solved. This brings up an important point about making effective charts: one way confusion arises is if there are two things from the visual that seem to contradict each other. In most line charts, if there is a line, and then a "total", the natural expectation is that the "total" is the sum of the data that make up the line. In this case, that "total" is the total new cases during the time period depicted. Total new cases isn't the same as total cases from case #1.
It's clearer to say "Total Cases on 3/17 = 183; on 4/1 = 3342".