## It's hot even in Alaska

##### Jul 31, 2019

A twitter user pointed to the following chart, which shows that Alaska has experienced extreme heat this summer, with the July statewide average temperature shattering the previous record;

This column chart is clear in its primary message: the red column shows that the average temperature this year is quite a bit higher than the next highest temperature, recorded in July 2004. The error bar is useful for statistically-literate people - the uncertainty is (presumably) due to measurement errors. (If a similar error bar is drawn for the July 2004 column, these bars probably overlap a bit.)

The chart violates one of the rules of making column charts - the vertical axis is truncated at 53F, thus the heights or areas of the columns shouldn't be compared. This violation was recently nominated by two dataviz bloggers when asked about "bad charts" (see here).

Now look at the horizontal axis. These are the years of the top 20 temperature records, ordered from highest to lowest. The months are almost always July except for the year 2004 when all three summer months entered the top 20. I find it hard to make sense of these dates when they are jumping around.

In the following version, I plotted the 20 temperatures on a chronological axis. Color is used to divide the 20 data points into four groups. The chart is meant to be read top to bottom.

You can follow this conversation by subscribing to the comment feed for this post.

"The chart violates one of the rules of making column charts - the vertical axis is truncated at 53F, thus the heights or areas of the columns shouldn't be compared."

Following this logic, the only sensible "zero" for the temperature scale is -459.67F. Why? Read the second paragraph in
https://en.wikipedia.org/wiki/Temperature
Any other "zero" for the vertical axis is as equally arbitrary as 53F.

Ducking and running now...

Dasve: Thanks for that needed note. What you're pointing out is that the "zero" is not to be taken literally. It depends on the scale. Nevertheless, the logic of the rule has to do with comparing two columns: when readers compare the areas of the two columns, they correspond neither to the raw values nor the difference. Using a dot plot solves this issue. The dot plot plus error bar is clearly better than the truncated column chart plus error bar - I am surprised many scientific journals continue to print the latter. Am I missing something?

No, following the logic, you don't use column charts in that situation. Climatic temperatures are an inappropriate data type for them, because their standard deviations are a tiny fraction of their numeric means.

(floating column charts are okay, though it should go without saying that the bottoms of the floating columns should all be above the line and visible)

The comments to this entry are closed.