Unemployment and job losses being such a worrying social problem in the U.S., one can find many attempts to visualize the predicament. In this post, I will look at two widely circulated charts, and some design decisions behind these charts.
First up, Slate uses an interactive map. (Click on the link for interactivity.)
Here, county-level data is being plotted, with the size of the bubbles indicating the number of jobs, red for jobs lost, blue for jobs gained, all of which computed year on year for a given month.
As you play with this display, think about the first question of the Trifecta checkup: what is the practical issue being addressed by this chart? What is the message the designer wants to convey?
Most likely, the answer will be something like the progress of job losses between 2007 and 2009, or which parts of the country are most affected by job losses.
Is this display the best at illuminating these issues? The designer has chosen the map to illustrate geography, and interactivity to illustrate time. These are not controversial -- but they should be controversial.
Maps are over-used objects. We see the biggest circles always in California, along the Eastern seaboard and in the lake region. This is true pretty much 90% of the time. What we are seeing is the distribution of population across the U.S. What we are not seeing is how job losses affect different regions on the right scale. The bubbles in California are almost always larger than those in the Midwest because there are more people in California.
On the time dimension, the designer has chosen to use monthly data but only for three years 2007-9. However, when this is multiplied hundreds of times by the county dimension, it is simply impossible for readers to grasp any trends from the interactive chart. We can learn the aggregate trajectory of when job losses start to pile up, when the recession deepens, etc. but since you are living through this recession, you don't need this map to tell you that.
It is in fact alright for the designer to collapse the time dimension! Look at the following chart used by the Calculated Risk blog, which displays a similar data set (unemployment rate rather than jobs gained/lost).
Notice that this designer collapsed both the time and geography dimensions. Time is partially present inside the boxes, as the maximum, minimum and current unemployment levels being plotted correspond to certain years in the past. The max and min are picked from data stretching back to 1976, a much longer period than the Slate chart. Geography is at the state level, rather than the county level (even though county-level data is available.) The states are sorted by the current level (July 2010) of unemployment.
The purpose of this designer is much easier to identify. For states like Nevada and California, the current situation is at the historical worst while for the Dakotas, they have seen much worse before.
If, for example, we want to know if different regions in the U.S. show discernable patterns, all we need to do is to use different colors of the boxes for different regions.
A problem with using the range (maximum and minimum) is outliers. The maximum or minimum values could be outliers. Put differently, the blue boxes shown above, while containing all unemployment rates going back to 1976, may not tell us much about the typical unemployment rate. What we might want to know is what the unemployment rate is like for most years.
For this, we can convert the max-min boxes into Tukey's boxplots.
In a boxplot, the box (gray area) contains half of the historical data. So if you look at DC (third from the bottom), unemployment in most years are narrowly constrained to about 6 to 8 percent although the max-min range is from under 5 to above 12.
For this chart, I sorted the states by median unemployment (black line inside the box) and the blue asterisks indicate the current level of unemployment (June 2010). Data comes from the BLS website.
Again, if regional differences need to be exposed, the boxes can be colored differently.
The outliers are plotted as dots on these boxplots; that too is data that may be considered extraneous to our purpose for this chart.
Is it a horrible thing for the designer to collapse dimensions like this? The data is available, and shouldn't all of them be used?
The truth is one can never cram all the data into a single chart. Even the Slate chart has collapsed some dimensions. Namely, the unemployment rates by demographics (age, gender, race, etc.) and by industry sector. Arguably those dimensions are as interesting as time and geography.
The bottom line: don't try to use every piece of data, you can't anyway, you will be making choices as to which dimensions to expose and which to hide, choose wisely.
Thanks to Aleks for pointing to the Visualizing Economics blog, which collects graphs about the economy, from where I found these charts.