I'd like to start 2015 on a happy note. I enjoyed reading the piece by Steven Rattner in the New York Times called "The Year in Charts". (link)
I particularly like the crisp headers, and unfussy language, placing the charts at the center. The components of the story flow nicely.
Here are my notes on some of the charts:
This chart is missing context, which is performance against population growth or potential. Changing the context also changes the implicit yardstick. The implied metric here is more-than-zero growth or continued growth.
It took me a while to find the titles to know what each section depicts. I'd prefer to put the titles back to the top or the top left corner. The "information in my head" is making me look at the "wrong" places. But otherwise, this is Tufte goodness.
This innocent thing prompts a host of questions. First, how could a "median" be found to have so many values within one population? It would appear that this is an exercise in isolating each quintile (decile in the case of the top 20%) and computing the median within each segment. In other words, the data represent these income percentiles: 95th, 85th, 75th, 50th, 3oth and 10th. Given that the income data have already been grouped, computing group averages makes more sense than calculating group medians. This is especially so when comparing changes over time. The robust median suppresses changes.
The bucketing of income presents another challenge. All buckets except at the very top are essentially bounded. All the central buckets have minimum and maximum values. The bottom bucket is bounded under by zero. The top bucket, however, is basically unbounded so important features of this data could be lost by summarizing the top bucket by its median.
A third problem surfaces if one were to inquire how the survey collects its data. According to the Federal Reserve description, the data concern "usual income" as opposed to "actual income". Respondents are told to ignore "temporary" conditions in describing their "usual incomes". It is likely the case that people think income increases are permanent while getting laid off is temporary so while usual income solves one problem (the long-term planner's problem), it creates a different problem (short-term bias). I particularly don't think it is a good metric for assessing changes around a recession/recovery.
I also wonder about the imputation of missing data. I'd assume that possibly there is a preponderance of missing values for unemployed people. If the imputation cannot predict the employment status of those people, then it would surely have inflated incomes.
I wonder if any of my readers knows details about some of these potential problems. Would love to hear how the Fed's statisticians deal with these issues.
On this chart, the author has found an excellent story, and the graphic is effective. I prefer to see the horizontal axis labelled "More Unequal" as opposed to "Less Equal" because of the conventional that "more" is usually placed to the right of "less" on the horizontal axis. Here is a scatter plot version of the data:
It shows the U.S. is a bit more extreme than all others.
This is another great chart. I like the imagery of the emptying middle. I find the labels a bit too long and requiring too much interpreting. I prefer this: