« Different pictures of unemployment | Main | Book review: Interactive Graphics for Data Analysis »

Comments

Tom Hopper

I'm curious: what tool did you use to create these graphs?

Hadley Wickham

Since ranks are so variable, unlike the underlying data, why use them?

Martin

Completely agree with Hadley; ranks may show differences and/or changes where actually almost nothing has happened, whereas the actual values would emphasize the "significant" changes.

My standard question applies: Can we get the data somewhere easily?

kerokan

I second Tom Hopper: What tool did you use?

Rosie Redfield

There's no scale!

Michael MacAskill

Agreed that ranks are often not useful here as they distort a continuous measure, enforcing a constant distance on the axis between states even when the actual difference in the value may be negligible.

e.g. this figure seems to show a precipitate drop in infant birth survival in the US:
http://www.nytimes.com/imagepages/2008/10/16/health/20081016_INFANT_GRAPHIC.html
But what actually happened was that mortality rates improved slightly (from an already very low value) in the US but much more elsewhere over the same period (e.g. Singapore). The rankings on infant mortality in developed countries are pretty meaningless, as all the actual values are close to zero. I showed this NY Times figure in a seminar once, along with a GapMinder chart showing the actual data. Rather than some Western states getting worse, as implied by the NY Times chart, what actually happened was that many nations converged over time to their similar, very low, mortality levels.

So rankings can introduce substantial apparent variability to a dataset when none actually exists. Similarly, they can hide significant differences, as the gap between 1st and 2nd may be huge, while the gap between 21st and 22nd, say, may be negligible.

Kaiser

Tom and Kerokan: The magic of R, which allows you to control pretty much every aspect of the chart.

Martin: the data is publicly available from the BLS website (Bureau of Labor Statistics)

Kaiser

Hadley, Michael, others: I'd encourage you to take a look at the underlying data before throwing out ranks as a measure. Ranks are particularly interesting in the context of the 50 states since the data is structured for such analysis. It is very natural for one state to compare itself against some other state. To me, the relevant question is not "why use ranks?" but "why haven't one looked at ranks?"

Ranks and rates measure fundamentally different things. Ranks measure a state's performance relative to other states while rates measure the absolute performance for each state.

For instance, I have plotted here the rank and rate changes over time for North Dakota. In 2000, ND experienced its lowest unemployment in the decade and yet relative to other states, ND had its worst rank. By late 2000s, ND experienced the worst unemployment the state has seen in a decade but relative to other states, it has weathered the recession well and is now ranked first. If we only looked at rates, this information is well hidden.

That said, Michael's point is well taken, and thanks for the thoughtful comment. Ranks often obscure information, especially if the differences are immaterial. But there is a large literature on using ranks, and that's because in some cases, they are illuminating.

Michael MacAskill

Hi Kaiser, perhaps we can converge on a consensus...
As you say, ranks can indeed be a valuable way of assessing performance. But the impression they give can be distorting, so a good guideline would be to generally accompany such a figure with one showing the actual values as well. This depends a little bit on the data being analysed: in child mortality, countries converge on a very low value, so ranks become meaningless. In unemployment, values rise and fall over time for each state and so rankings can be more meaningful. But one can't be sure without seeing the data from both perspectives, e.g. the difference between 40th and 50th ranking could be just 0.1%, but that between 1st and 11th could be 2%

The North Dakota example nicely shows that the picture each gives can be wildly different. What I think would be ideal is to produce a figure containing all 50 states, with actual unemployment values. With both forms of the figure available to compare and contrast, then the conversation we'd be having would be a more interesting one about the actual data : )

Charles Franklin

Small multiples are another approach, somewhat different from the maps of the previous post or the rankings here.

http://pollsandvotes.com/PaV/?p=95

This example shows each state series, national series, and as gray background all other states, in each small multiple.

States are sorted from lowest unemployment to highest, so relative ranking is also apparent.

I think the problem with such small multiples is the cognitive effort required for a casual reader to decode the information, though anything that presents 50 states over time is going to have to cope with that as well.


Charles
charles@pollsandvotes.com

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing analytics and data visualization expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Good Books

Keep in Touch

follow me on Twitter

Residues