Remaking a great chart

Feb 11, 2013

One of the best charts depicting our jobs crisis is the one popularized by the Calculated Risk blog (link). This one:

I think a lot of readers have seen this one. It's a very effective chart.

The designer had to massage the data in order to get this look. The data published by the government typically gives an estimated employment level for each month of each year. The designer needs to find the beginning and ending months of each previous recession. Then the data needs to be broken up into unequal-length segments. A month counter now needs to be set up for each segment, re-setting to zero, for each new recession. All this creates the effect of time-shifting.

And we're not done yet. The vertical axis shows the percentage job losses relative to the peak of the prior cycle! This means that for each recession, he has to look at the prior recession and extract out the peak employment level, which is then used as the base to compute the percentage that is being plotted.

One thing you'll learn quickly from doing this exercise is that this is a task ill-suited for a computer (so-called artificial intelligence)! The human brain together with Excel can do this much faster. I'm not saying you can't create a custom-made application just for the purpose of creating this chart. That can be done and it would run quickly once it's done. But I find it surprising how much work it would be to use standard tools like R to do this.

***

Let me get to my point. While this chart works wonders on a blog, it doesn't work on the printed page. There are too many colors, and it's hard to see which line refers to which recession, especially if the printed page is grayscale. So I asked CR for his data, and re-made the chart like this:

You'd immediately notice that I have liberally applied smoothing. I modeled every curve as a V-shaped curve with two linear segments, the left arm showing the average rate of decline leading to the bottom of the recession, while the right arm shows the average rate of growth taking us out of the doldrums. If you look at the original chart carefully, you'd notice that these two arms suffice to represent pretty much every jobs trend... all the other jittering are just noise.

I also chose a small-multiples to separate the curves into groups by decades. When you only have one color, you can't have ten lines plotted on top of one another.

One can extend the 2007 recession line to where it hits the 0% axis, which would really make the point that the jobs crisis is unprecedented and inexplicably not getting any kind of crisis management.

(Meanwhile, New York City calls a crisis with every winter storm... It's baffling.)

You can follow this conversation by subscribing to the comment feed for this post.

It's not just at Calculated Risk. There's more of these recession trackers at the FRB-Minneapolis site.
http://www.minneapolisfed.org/publications_papers/studies/recession_perspective/index.cfm?

One of the nice things about their versions is that you can click to add/subtract various recessions.

I've been using these in my forecasting class to illustrate cyclicality.

Or, inexplicably, is receiving management and we find that central management of people's lives doesn't make sense? And find that giving money to politician's to give out just gets the money to favored groups and is quickly taken up in bureaucracy?

zbicyclist: Yes, that chart is everywhere nowadays. I'm annoyed that it is much harder to make in a statistical package than in Excel.

Excellent example of taking a good chart and making it great by clarifying and simplifying.

It does a great job of stating a very important message.

On the subject of New York though, I don't understand what's baffling. When you have large numbers of people displaced from their homes or without heat because of a hurricane, and you throw in major winter storms...that's a crisis.

Not only would that chart be easy to create in R, it'd be easier to reuse in the future, and we could easily inspect your code to see if you made any mistakes.

This chart would be quite easy to create in R actually -- especially using Hadley's plyr package (though built in functions make it pretty easy too).

Hadley and Dean: I'm sure you're better with R than most of us so I'd love to hear more. I have two separate issues with this task:

1) assuming I know exactly the chart to build, and have all the right data elements, it is still much easier to use Excel than any coding language. This is true even if I have to update the chart month after month like CR blog has to. I see this as a challenge to those creating graphing software. (PS. Here, I'm thinking about the original CR version - I don't think that one can easily make small multiples in Excel.)

2) I don't see a straightforward way to proceed in R (or other statistical languages) from grabbing the employment level data from the BLS website, and having the data formatted precisely for the chart I made. Perhaps one of you can give us some pseudo-code to walk through how you might do it. I think it's easier to think about it than to actually do it.

Here is fast way to download data from this website using R and plot it with ggplot2.

Every plot which is made in Flash has a XML file which contains data. Firebug in Mozilla or Developer Tools in Chrome is very useful to find it out.

In this case direct link to data for employment change is below:

http://www.minneapolisfed.org/publications_papers/studies/recession_perspective/parsedxml/employment.xml

To get it directly to R. We can use XML package to parse this file.

library(XML) #to parse xml file
library(reshape2) # to transform data for ggplot2
library(ggplot2)

data<-'http://www.minneapolisfed.org/publications_papers/studies/recession_perspective/parsedxml/employment.xml'
data.xml<-xmlParse(data)

Now we have data in xml, then we shoud transform it to data.frame:
data.df<-xmlToDataFrame(getNodeSet(data.xml,'//series'))

then we add ID for months
data.df\$ID<-1:nrow(data.df)

Transform it into data frame for ggplot2
data.long<-melt(data.df,id.vars='ID')

Then, if we want to, change names
names(data.long)<-c('Months','Crisis','Employment')

We should change class of variable Employment (it is character)
class(data.long\$Employment)
data.long\$Employment<-as.numeric(data.long\$Employment)

Now we can plot it with ggplot2.
ggplot(data=data.long,aes(x=Months,y=Employment,colour=Crisis)) + geom_line()

The comments to this entry are closed.