Nice example of histograms
Geographical data charted right

Poking at the data behind a chart

Reader Jamie D. wasn't very amused by the following chart, from the Freakonomics blog (link):


Jamie summarized his view as follows:

First of all, a quick look of the graph makes you think you're comparing states with helmet laws vs. those w/out helmet laws.  But, upon closer reading, it's actually just a comparison between states that have repealed their helmet laws between 1994 and 2007 and ALL OTHER STATES.  Reading further down it appears that even in the heyday of helmet laws, only 26 states had them.  Thus, the graph is really a comparison between 7 states that repealed the helmet laws in that time period and the other 43 states, 24 of which have never had helmet laws at all.

In the Trifecta checkup, this problem surfaces as a disconnect between the question being investigated, and the data used to address the question. (For an explanation of the Trifecta checkup, see this post.)

Further, Jamie asked:

More importantly from a graphical perception point of view: the horizontal axis identifies itself as "years relative to repeal."  While that time horizon makes sense with respect to the repeal states (in light green), it is not clear at all what "year relative to repeal" means in the 43 states that did not have a helmet law repeal during the time at issue (the dark green).  This might be further explained in the book (which I don't have), but even if does, the chart is misleading and not helpful in explaining data (which is its raison d'être.)

Aligning the data to a particular event (like the repeal of a particular law) is typically a very smart thing to do... and it belongs to one of many statistical adjustments that make perfect sense, like the seasonal adjustments of economic data (link). But here, as Jamie pointed out, in the "control" group in which states did not repeal the helmet laws, it isn't clear what should be the "anchor" year (time = 0).


At a more abstract level, the designer is working with a dataset with four dimensions: the state, the year, the status of the helmet law within a state, and the organ donation rate. The data can be arranged as a 50-row, 4-column table.

The first issue has to do with the values in the third column (status of the helmet law). It would be a mistake to positively identify the states that have repealed the law as "repeal states", and then by default label all the rest as "non-repeal states". Instead, there should be three levels: repeal states, non-repeal states, and no-helmet-law states. I'd then plot three lines instead of two.

The second issue arises when the designer tries to transform the second column, from actual years (2000, 2001, etc.) to relative years (anchor year = 0, and other years go +1, +2 and -1, -2, etc.). At some point, she would need to make an explicit decision of how to create "relative years" for the non-repeal and no-helmet-law states.


One other problem with this chart is not starting the vertical axis from zero when they are drawing attention to the area under the lines, and not the levels of the lines themselves. If they use a line chart instead, the start-at-zero rule is not as important.

I'll skip the critique of the overall plan of this Freakonomics analysis as I already wrote much about that (with Andrew Gelman). See our article here.


Feed You can follow this conversation by subscribing to the comment feed for this post.


Great post as usual. I learned a few new details I hope to employ some day in my own infrequent charting duties.

One minor brow-wrinkling: what's the point of referring to the designer—who I can't find credited by name on the graphic or in any public posts about the chart, though perhaps in the book—as "she"? It seems needlessly specific to the point of being, well, pointed.


Yet another problem might be a change in the level of riding in states with law changes. I'm not a motorcyclist, but it's clear that mandatory helmet laws for bicyclists cut down on bicycling. So maybe there's more motorcycling after the law is changed -- meaning the risk per hour might not be different.

The comments to this entry are closed.