Here's a chart from one of the Italian dailies I picked up in Rome last August . It apparently plots the number of hectares of farmland that was burnt during various fires over time.
While the chart is clean and pleasing to the eye, it has a malformed time axis. In the side-by-side comparison shown below, you can see how the evenly-spaced time axis completely distorts the cadence of the data.
In fact, the data should be put into a bar chart, rather than a line chart. Lines are used primarily to denote trends, and sometimes to compare profiles. Neither of these cases apply here.
The bar chart requires proper spacing too to present the years in which no hectares were burnt by fires.
The first thing we know about kitchen cabinets is that they are not large enough. If you live in a small city apartment, you're always looking for ways to maximize your space. If your McMansion has a huge kitchen, you'll run out of space all the same, after splurging on the breadmaker, and the ice-cream maker, and the panini grill, and containers for garlic, onions, different shapes of pastas, and the peelers for apples, garlic, carrots, the egg-separator, the foam-maker, and so on.
Another thing we know is that no matter how many and how large the cabinets are, there is not enough premium space, by which we mean front-facing space within arm's reach. What has this to do with graphs and charts? We'll find out soon enough.
In this weekend's edition of New York Times (link 1, link 2), several climate scientists wrote about droughts in America: "widespread annual droughts, once a rare calamity, have become more frequent and are set to become the 'new normal'". What caught my eye was the following graphic showing precipitation levels, enticingly titled "21 Centuries of Rainfall in New Mexico". (See the full graphic here).
The blue lines going up represent years in which rainfall was higher than normal; the orange lines going down show years of below-normal rainfall. "Normal" is defined as the average rainfall between 1931 and 1990. Particularly useful were the annotations telling us in certain centuries, the number of years below normal.
I immediately needed to see the following chart:
This just takes the annotations and plotted them directly (I made up the data where they were not noted.)
What we are seeing here, at the scale of centuries, is that in the most recent period (only up to 1992), New Mexico is getting wetter.
Yes, this chart doesn't seem to support the scientists' assertion. In fact, I'm not sure why the NYT decided to insert this "news analysis" next to the opinion column. It's not that the analyst doesn't see the contradiction - he stated "the bigger picture, from El Malpais, suggests that the West has endured far drier periods. Uncomfortably drier."
I have major issues with this juxtaposition. If the NYT does not think the opinion column is correct science, it should decide not to print it. If the NYT thinks there are people who might object to the science, it should counter it by citing the work of other scientists (none is cited in the sidebar). While El Malpais may provide the longest measure of rainfall conditions, this is no basis to claim it measures "the bigger picture", and specifically it is shocking, perhaps reflecting the cultural bias of the newspaper, to see that this data from New Mexico tell us something about "the West" and somehow not about "the East". Why would it generalize to the West but not the East?
Back to the chart, and specifically kitchen cabinets.
We have two charts from one data set. The chart of blue and orange spikes contains every data point. The stacked column chart shows only aggregated data, specifically how many above-average and below-average years in each century. The first chart makes readers work very hard to get any information out of it. The designer recognizes this and adds useful notes, generally about the proportion of below-average years. Assuming that those proportions are the key to deciphering the chart, why not plot them directly?
The objection is that much information is lost by not including the rest of the data. One cannot deny it. For example, just looking at the stacked column chart, you cannot know that the late 1980s and early 1990s were extremely wet years in New Mexico, over three standard deviations above the norm, nor can you know that there was a mega-drought in the late 16th century lasting decades.
However, chart designers should realize that there is a shortage of front-facing, accessible space in the kitchen cabinets. Putting more into a chart means some of the information will be pushed to the back, or out of arm's reach. If you need a ladder to get to that cabinet, what would you put in it? Would you rather leave it empty? I know I would.
The following chart by the Financial Times reminds me of the famous Napoleon Russian Campaign map:
I also love it when geographical data, in this case average house price data by region, are plotted without a map. If plotted on a map, the relative prices are typically differentiated by color. On this chart, they are encoded in the heights of the columns. Our brains are just not wired to translate color differences into numeric differences so every time we can avoid color scales, we should.
Like Minard's chart, multiple dimensions are comfortably accommodated. The location along the river bank. The north/south orientation of the location. The "width" of a neighborhood.
A minor quibble is with the choice of data series. I wonder if price per square feet would be a better metric. One can also try a relative scale (indexed to the average).
Reader Dave S. was disturbed by the graphics in the inaugural World Happiness Report, published by Jeffrey Sachs's Earth Institute (link). It's a 200-page document with lots of graphs, many of which require rework.
Here's a pie chart showing (purportedly) what "happy" people in Bhutan are happy about:
I'm really curious how these domains add up to 100% exactly. Since the data came from some kind of survey, you typically would allow each respondent to pick more than one domains in which he or she is happy. If that is the case, then it would not make sense to add up responses, nor would the total (100%) signify anything.
If, on the other hand, respondents are forced to pick only one domain, it is very suspicious that all 9 domains would essentially receive the same number of votes. Nor would it make sense to ask survey-takers to select only one domain if all 9 domains contribute to someone's happiness.
Pie charts are perhaps the most abused chart type. There are just endless examples of poorly executed pie charts (just browse my last few posts). The prevalence of abuse may be reason enough to ban them.
Paired with Figure 4 shown above is Figure 5 shown below, which deepens the mystery:
Compare the captions. What's the difference between "In which domains do happy people enjoy sufficiency?" and "Indicators in which happy people enjoy sufficiency"? The categories are related but not identical (Education vs. Schooling, Health vs. Self reported health status, etc.) However, in Figure 5, the distribution is uniform as in Figure 4. Is the data contradictory? Or the captions misleading?
This column chart would be better presented as a horizontal bar chart so that readers don't have to break their necks trying to read the category names.
The designer should also perform the routine task to get rid of the 120% tick mark on the proportion axis that comes from Excel.
Reader Jim S. was rightfully mystified by the following map that appeared on the Ars Technica blog (link), and purported to demonstrate that high temperatures of March 2012 across most of the U.S. were of historical significance.
I must say the production values of this map, produced by the people at NOAA, are superb. I love, love, love the caption that the Ars Technica editors added to the map. I wish they had blown it up to 20-point font, and made it shiny :) Besides that, the colors are well-chosen, and it doesn't feel cluttered despite having 48 numbers printed on it.
Like Jim, I'm hypnotized by the drumbeat of 118, 118, 118, ... all over the red area. What could the numbers mean? They could be temperatures in Fahrenheit (although 118 degrees in March surely would have been newsworthy). The legend does lend support to this interpretation (see right), what with the extra-large font announcing "Temperature". Jim commented: "But it seems odd that such a large area would have precisely the same high."
Not so soon, Jim. The NOAA also made the chart shown on the right (link). So indeed, the entire country could be given one value of 118.
If not Fahrenheit, what could the numbers mean? They could be some kind of index in which case the average value would seem to be 50 (the white patch). That would be one strange index.
Too bad this map is produced by specialists for specialists, leaving us commoners guessing. The only clue we got is in the title, "Statewide Ranks".
But this isn't very helpful either. The 118s are still ringing in my ear. If the numbers are ranks, then 118 would likely be the maximum rank, given as there are so many 118s. But I can't figure out which metric has 118 levels.
I finally found my way to this page, which explains what NOAA calls "climatological ranking". The page also has a chart (below), which can serve as a sort of legend for the maps, but is almost as difficult to read.
Apparently there are 118 years worth of recorded temperatures, going back to 1895. And within each state, the annual temperatures for the past 118 years were ranked from lowest to highest, meaning that 118 is the hottest on record.
Given that there is lop-sided attention to hotter temperatures (global warming), it would be much better to reverse the ranking so that 1 is the hottest month year!
The chart also explains that the years are grouped into three equal buckets to indicate "below normal", "near normal" and "above normal".
Too bad this chart gives us three or five levels of ranking while in the map they use seven colors (levels).
They really ought to include on the map (a) the definition of the ranking and (b) the range of ranks corresponding to each color.
While researching this post, I found this wonderful page of NOAA maps (link). This is a beautiful illustration of the process of statistical aggregation. Notice the trade-off between simplicity and loss of information. The art in statistics is to figure out the right balance between the two.
I always like to explore doing away with the unofficial rule that says spatial data must be plotted on maps. Conceptually I'd like to see the following heatmap, where a concentration of red cells at the top of the chart would indicate extraordinarily hot temperatures across the states.
I couldn't make this chart because the NOAA website has this insane interface where I can only grab the rank for one state for one year one at a time. But you get the gist of the concept.
Did I tell you I love, love, love the caption? Go right ahead, and make a slogan for your chart today!
[PS: Reader Mark Bulling (see his comment below) contributes a realization of my heatmap suggestion above. One of the benefits of this chart is its economy, as a small version of it shows:
The following two charts plot the same data, the yearly amount of rainfall in Los Angeles over the last two decades or so. (The original chart, on the left, came from the LA Times. Link here.) Why do they give such different impressions?
The left chart appears very busy despite the simplest data set, thanks to printing the entire set of 21 numbers, each to the second decimal point on the chart itself. The axis labels do not provide extra information when all the data has been included, and it is highly unlikely any reader of the newspaper requires precise measurements of rainfall.
Chances are the reader is interested in how the general trend of rainfall in recent years compared to the historical pattern. Credit the designer for pulling the relevant data, including the average, maximum and minimum rainfall on record. On the right chart, all three historical numbers are incorporated into the axis so that they could act as reference levels.
Not to mention the axes were switched to preserve the usual placement of time on the horizontal axis.
The bar chart emphasizes the absolute values of each rainfall amount while the dot plot displays the differences between each measurement and the historical average. On the right chart, it is easy to observe whether any year's rainfall is above or below the expectation. Over the last two decades, it appears there were about as many years above as below the average, and the overages and underages do not exhibit any clustering.
From a Trifecta checkup perspective, we find that the choice of data is not attuned to the purpose of the chart. The right data has been collected; a small transformation would have made all the difference. The selection of the chart type also fails to address the purpose of the chart.