Two tales of one dataset
Jan 10, 2012
The following two charts plot the same data, the yearly amount of rainfall in Los Angeles over the last two decades or so. (The original chart, on the left, came from the LA Times. Link here.) Why do they give such different impressions?
The left chart appears very busy despite the simplest data set, thanks to printing the entire set of 21 numbers, each to the second decimal point on the chart itself. The axis labels do not provide extra information when all the data has been included, and it is highly unlikely any reader of the newspaper requires precise measurements of rainfall.
Chances are the reader is interested in how the general trend of rainfall in recent years compared to the historical pattern. Credit the designer for pulling the relevant data, including the average, maximum and minimum rainfall on record. On the right chart, all three historical numbers are incorporated into the axis so that they could act as reference levels.
Not to mention the axes were switched to preserve the usual placement of time on the horizontal axis.
The bar chart emphasizes the absolute values of each rainfall amount while the dot plot displays the differences between each measurement and the historical average. On the right chart, it is easy to observe whether any year's rainfall is above or below the expectation. Over the last two decades, it appears there were about as many years above as below the average, and the overages and underages do not exhibit any clustering.
***
From a Trifecta checkup perspective, we find that the choice of data is not attuned to the purpose of the chart. The right data has been collected; a small transformation would have made all the difference. The selection of the chart type also fails to address the purpose of the chart.
The striking thing to me is how big the year-over-year variation is and how extreme the outliers are. The average year is .8 stdev from the mean.
So calculating a 5-year moving average would seem a productive course ... over the 18-year period where we can calculate the moving average, rainfall appears to drop by .25"/year with an R^2 of .44. I would want more years to be sure that the beginning wasn't distorted, but if that remained a strong conclusion from the evidence, a moving-average line would be a nice addition to the chart.
Posted by: Gary | Jan 10, 2012 at 02:13 PM