Loss aversion manifests itself in chart-making, as it does in economics. In chart-marking, loss aversion can be defined as the tendency to avoid losing data at any cost. Given a rich data set, designers often make the mistake of cramming as much data into the chart as possible. This is taking Tufte's concept of maximizing data-ink ratio to the extreme, and it often leads to awkward, muddled charts.
Gelman provided a great example of this recently. See here.
Every piece of data is given equal footing, which results in nothing standing out. The reader gasps for air.
The best evidence is the set of small multiples shown at the bottom. These give the amount of phosphorus flowing into the lake annually since 1973, as measured from four locations.
The point is that the pollution has been most serious on the northern shores, especially in recent years. Thus, the Florida plan focusing on the southern region is likely to make limited impact.
The choice of vertical lines is smart, as the typical time-series connected-line chart would jump up and down crazily. A simple vertical axis marks the amounts, avoiding the temptation to print all the data. The designer realizes it is the trend, rather than individual values, that is the issue.
Taken together, the three components tell a good story. This is a well-executed effort. The Times once again proves itself the leader in developing sophisticated graphics.
Reference: "Florida Deal for Everglades May Help Big Sugar", New York Times, Sep 13 2008.
We hope this is indication that the British paper Guardian (with one of the best websites out there) is joining the fun. It appears that they have quietly debuted an interactive graphics feature. The first edition addressed the oil price crisis.
The use of inflation-adjusted figures seems obvious but we don't see much of these in the press. Highlighting the peaks and providing annotation (when moused over) is an excellent touch. The gridlines and axis labels (especially the year axis) are thankfully restrained. We don't see the need for the unadjusted series (blue line), however. The fact that the gap grew larger the more time we went back told us little, as it invited readers to read into it more than what it truly was, the time value of money.
Later on, they used an oil barrel object to illustrate the components of retail oil price. The height of the cylinder is indeed proportional to the data plotted. If only they colored the end of the cylinder gray instead of green! As it stands, the green portion has about the same area as the red.
A reader Carly C. from Streetsblog created the following chart and wanted to know if there are better ways to present the data. She already disliked the double axes and thought of various options including using relative scale.
Generally speaking, dual axes in which each axis takes its own scale is like a football team with two "good" quarterbacks rotating under center, or two "great" CEOs sharing power. We have never seen those situations work out.
When we have two quantities under comparison, we like to put them on the same scale. In this case, converting the scale from absolute numbers to relative would do the trick.
The data paint a powerful story: as bike volume increased over time, bike accidents decreased. The stitching together of two lines at year 1999 was an artifact of manipulating the scales. What Carly had in mind can be accomplished using an index set at 100 in 1999. This would lead to the chart shown left. The substance of this chart and Carly's original is the same but the revised one has a single axis.
Indexing time series data is a widely used technique. Each issue of the Economist, for example, contains many such charts. This type of chart, however, suffers from a critical and under-appreciated problem: the visible pattern frequently and critically depends on timing. Specifically, it makes a huge difference which year is selected as the baseline (index=100).
A lot of mischief is possible by picking a special baseline. Take for example, I created the same chart three times, using 1998, 1999 and 2000 respectively as baselines. When 1999 w
as 100 (middle chart), a criss-cross pattern showed up between 2001 and 2002, leading readers to conclude that the gap between growth in volume and growth in accidents developed during 2001. In the other two charts, the gap appeared around 2000. Also, the bottom chart exhibited a clear growing gap (after dumping the disagreeable data before 2000).
Unfortunately, this is a feature of such charts; whether or not timing distorts the information presented depends on how rugged the underlying data is. Put another way, these charts can be affected by outliers. (In this example, there were sharp changes in bike volumes in 1998-2000.)
PS. [5/12/2008] How opportune was Andrew's post on R graphics default headaches. I was too lazy to figure out the defaults and let R figure out the dimensions (poorly); with Jake's suggestions, the new set of charts looked much better.
Andrew M., a new but loyal reader, didn't like the flow charts used by the EPA to illustrate cleantech. We had some lively discussion on flow charts before. The bottom line seems to be that they are difficult beasts to tame, especially when the relationships are complex. The example shown by Andrew (below) is not particularly horrid in this scheme of things. It's the abundance of annotations and colors that cause dizziness.
Here's a view of the same data, using a partitioning approach. The inputs are fixed at 100 units, which I find easier to comprehend, while the original fixed output at 30 units of electricity and 45 units of heat. And of course, it is a tremendous service to readers not to have to work out the efficiencies. Tacitness is a vice, not a virtue, in graph-making.
Reference: "Catalog of CHP Technologies", US EPA Combined Heat and Power Partnership.
Anna E. submitted this great example from Yahoo! Green. A well-meaning chart but stuffed with redundancy.
Much appear to be going on and yet the entire chart contains 15 data points, Boston's ranks on each of 15 categories. The bar lengths convey the same information as the data labels. The legend provides a catchy name for different levels of ranks (0-10 = "leader"; 10-20 = "advances"; etc.). The colors merely reiterate the catchy titles. Similarly, the colored squares repeat the information in the bars.
In the name of green, we cleaned up this chart:
As a standalone graph, the categories should be ordered by Boston's ranks. Here, we assume that cross-referencing cities is needed so we leave the order unchanged.
This chart from the NYT was intended to show how the EPA has moved the bar on vehicle mileage ratings: 2008 estimates were lower than 2007 estimates across the board, regardless of manufacturer, model and city/highway.
The chart was built from one basic component, repeated for each model. I like the discreet gridlines (the white ticks) which enable readers to count off the mileage ratings.
The data is rich: ratings were given along three dimensions (model, year of estimate and city/highway). Readers can benefit from a stronger guidance in where to look for the most pertinent information. As the chart stands, it is merely a container for the data. It fails our self-sufficiency test: all the data were printed on the chart, and the bars add little.
In the junkart version, I use knowledge of the data to structure the chart. First, noting that sedans, hybrids and trucks/SUVs/minvans have different levels of mileage ratings, I clustered the models into three groups. Secondly, the city and highway ratings were separated into two columns as I consider the between-model comparisons more important than city-highway comparisons. The chart is a dot plot, with a vertical tick for 2007 estimates and a dot for 2008 estimates. It's easy to see that all dots sit to the left of vertical ticks.
More subtly, we can also see that the hybrids appeared to have been penalized more. Or perhaps, the higher the rating, the larger the downward adjustment...
Source: "Mileage Ratings Are Still Estimates, Though Closer to Reality", New York Times, Sept 16 2007.
One of the things I picked up from Tufte is the horrible habit of counting the amount of data on a chart. This is part of the info gathering to estimate the data-ink ratio (amount of data divided by the amount of ink used to depict them).
Leon B, a reader, left this in my inbox, months ago it turned out. This is the British government's way of informing people how energy-efficient their homes are. As Leon said:
these charts might be a great example of governments going overboard with colours, bars, letters and numbers and lines for something that really only has four data points.
In addition, I find the use of two different scales to be confusing and unnecessary. If it is decided that scores in a particular range can be grouped as A, B, ..., G, then the original scale should be discarded. 52 is E and 70 is C. (This is especially so since the score ranges are not intuitive, like 69-80 = C ?!)
Even worse, what's the point of citing the 0-100 scale without explaining what is the metric?
A table presentation does a far better job in a fraction of the space:
PS. This post set off a torrent of emotions (see the comments). Another version that I discarded was the simplest table possible. In my view, there is still way too much distracting "junk" in the original design. No one has yet explained why the 0-100 scale should be emphasized, or what it means!
It's pretty hard to decree hard-and-fast rules for graphical design; every rule seems to admit its exception. This reinforces Tufte's contribution as he has successfully organized the rules in his collection of books.
Dustin J sent in this chart from the Economist. Its first impression is ugly and overly complex.
Steven Few says not to use stacked bar charts because you cannot compare individual values very easily and as a rule I avoid stacked bars with more than six or seven divisions. What do you think of this stacked bar--I think it is quite effective in telling the story.
On this blog, I have also re-done some stacked bar charts but this one is truly an exception to the rule. The reason why this one works is that it's not about
the individual components, it's showing that the US consumes more than
all those countries combined.
If only it has the proper caption! The Economist is uncharacteristically detached here: "Petrol consumption per day", "Litres bn, 2003". How about "Goliath v. Davids"? "US v. the World"? "Dream Team USA"?
It'd help if they tone down the colors; also, by simply annotating the total litres for the US and the total for the other countries, they would have made a clearer point without using gridlines. But these are minor glitches in an otherwise effective chart.