« July 2013 | Main | September 2013 »

Create your own fine print


Note: The winner of the Book Quiz Round 2 was announced on my book blog. Congratulations to the winners. You can get your own copy of Numbersense here.


A common advice for anyone living in the U.S. is "read the fine print." If you receive a notice or see an ad, and there is an asterisk or some copy in almost invisible font located at the bottom of the page, you better pull out your magnifying glass.

(Clip art licensed from the Clip Art Gallery on DiscoverySchool.com.)

If you are a data analyst, you better have a magnifying glass in your pocket at all times. One of the recurring themes in Numbersense is that details matter... a lot. This is particularly relevant to Chapters 6 and 7 on economic data.


Last week, on the first Friday of the month, the jobs report came out. For the best reporting on the data itself, with succinct commentary but no hand-waving, I go to Calculated Risk blog.

One of the charts highlighted (in this post) is the unemployment rate by educational attainment. This is the chart that leads to horribly misleading statements saying that the solution to the unemployment crisis is more education. I ranted about this before--see here and here.


Taking this chart at face value, you'd say that the unemployment rate is lower, the more education one has. One can also say that the unemployment rate is less volatile, the more education one has.

Bill makes two succinct comments, basically letting his readers know this chart is next to worthless.

1. Although education matters for the unemployment rate, it doesn't appear to matter as far as finding new employment - and the unemployment rate is moving sideways for those with a college degree!

The issue behind this is the "cohort effect". The chart above aggregates everyone from 25 years old and over. This means it treats equally people who just graduated from college last year and people who got their degrees thirty years ago. Why does this matter? A jobs recession hits certain types of people harder than others, and one important determinant is work experience (another would be the industry one works in.) The low unemployment rate for all college graduates masks the challenging job market for recent college graduates. The misinterpretation of this chart leads to wrongheaded policies such as make more college gradutes.

2. This says nothing about the quality of jobs - as an example, a college graduate working at minimum wage would be considered "employed".

This is where the magnifying glass is critical. You should not assume that your idea of "employed" is the same as the official definition of "employed". Bill raised the issue of minimum wage. Elsewhere, other commentators noted the issue of "part-timers". Part-time employment is not distinguished from full-time employment in the official aggregate statistics.

Taking this further, isn't it plausible that unemployment "trickles down"? As the college graduates grab whatever job they can find, including the minimum-wage ones, they push the high-school graduates out of jobs.


In data, there is often no fine print to be found. In Big Data, this problem is aggravated by a thousand times. Unfortunately, magnifying blank is still blank. So, having the magnifying glass is not enough.

The solution then is to create your own fine print. Spend inordinate amounts of time understanding how data is collected. Dig deeply into how data is defined.

No, this work is not sexy. (PS. If you can't stand it, you really shouldn't be in data science.)

In Chapter 6 of Numbersense, I did this work for you as it relates to jobs data. What I show there is that there is no "right" way to measure employment--it's not as clearcut as you'd like to think. If you were to put forth your definition of "employed" for comment, your definition will absolutely get criticized, just the same way you're criticizing the government's definition.



PS. Larry at Good Stats, Bad Stats pulled out his magnifying glass and wrote a series of posts about education, employment and income. He mildly disagrees with me.

Do we need more college grads?

Education and unemployment

Education and income





Arresting visualization

Reader Steph G. didn't like the effort by WRAL (North Carolina) to visualize the demographics of protestors in Raleigh. It sounds like the citizens of NC are making their voices heard. Maybe my friends in Raleigh can give us some background.

There are definitely problems with the choice of charts. But I rate this effort a solid B. In the Trifecta Checkup, they did a good job describing the central question, as well as compiled an appropriate dataset. I love it when people go out to collect the right data rather than use whatever they could grab. The issue was the execution of the charts.


The first was a map showing where the arrested protestors came from.


Maps are typically used to show geographical distribution. The chosen color scheme (two levels of green and gray) compresses the data so much that we learn almost nothing about distribution. I clicked on Wake County to learn that there were 178 arrests there. The neighboring Randolph County had only 1 arrest but you can't tell from the colors.

The next chart shows the trend of arrests over time. I like the general appearance (except for the shadows). The problem is the even spacing of the columns when the gaps between the arrests are uneven.


Here's a quick redo, with proper spacing:



The final set of charts is inspired. They compare the demographics of those arrested protestors against the average North Carolina resident. For example:


 For categories like Age with quite a few levels, the pie chart isn't a good choice. It's also  hard to compare across pie charts. A column or dot chart works better.