Sheep tramples sense
Losing sleep over schedules

A great start to the year

I'd like to start 2015 on a happy note. I enjoyed reading the piece by Steven Rattner in the New York Times called "The Year in Charts". (link)

I particularly like the crisp headers, and unfussy language, placing the charts at the center. The components of the story flow nicely.

 ***

Here are my notes on some of the charts:

Nyt_2014_jobgains

This chart is missing context, which is performance against population growth or potential. Changing the context also changes the implicit yardstick. The implied metric here is more-than-zero growth or continued growth.

Nyt_2014_gasprice

It took me a while to find the titles to know what each section depicts. I'd prefer to put the titles back to the top or the top left corner. The "information in my head" is making me look at the "wrong" places. But otherwise, this is Tufte goodness.

Nyt_2014_inequalitygetsworse

This innocent thing prompts a host of questions. First, how could a "median" be found to have so many values within one population? It would appear that this is an exercise in isolating each quintile (decile in the case of the top 20%) and computing the median within each segment. In other words, the data represent these income percentiles: 95th, 85th, 75th, 50th, 3oth and 10th. Given that the income data have already been grouped, computing group averages makes more sense than calculating group medians. This is especially so when comparing changes over time. The robust median suppresses changes.

The bucketing of income presents another challenge. All buckets except at the very top are essentially bounded. All the central buckets have minimum and maximum values. The bottom bucket is bounded under by zero. The top bucket, however, is basically unbounded so important features of this data could be lost by summarizing the top bucket by its median.

A third problem surfaces if one were to inquire how the survey collects its data. According to the Federal Reserve description, the data concern "usual income" as opposed to "actual income". Respondents are told to ignore "temporary" conditions in describing their "usual incomes". It is likely the case that people think income increases are permanent while getting laid off is temporary so while usual income solves one problem (the long-term planner's problem), it creates a different problem (short-term bias). I particularly don't think it is a good metric for assessing changes around a recession/recovery.

I also wonder about the imputation of missing data. I'd assume that possibly there is a preponderance of missing values for unemployed people. If the imputation cannot predict the employment status of those people, then it would surely have inflated incomes.

I wonder if any of my readers knows details about some of these potential problems. Would love to hear how the Fed's statisticians deal with these issues.

Nyt_2014_gini

On this chart, the author has found an excellent story, and the graphic is effective. I prefer to see the horizontal axis labelled "More Unequal" as opposed to "Less Equal" because of the conventional that "more" is usually placed to the right of "less" on the horizontal axis. Here is a scatter plot version of the data:

Screen shot 2015-01-01 at 10.52.13 AM

It shows the U.S. is a bit more extreme than all others.

Nyt_2014_polarize_full

This is another great chart. I like the imagery of the emptying middle. I find the labels a bit too long and requiring too much interpreting. I prefer this:

Redo_nyt2014_polarize

 

Comments

Nate

Why is there a shaded area and regression line on the "Gini with vs without Adjustments" graph? Are there a set of countries that lie on that regression line? What's it supposed to represent? Wouldn't showing the countries be enough? And the scale goes from .25 to .4 and .45 to .7? Casual readers have a hard time as it is saying what this number represents. How much of a difference of .15 or .25 is meaningful? Chart obfuscates this. Personally the first one makes more sense to me since the data is ordered and it's easy to compare two multiples, rather than scatter plot.

Also, there's no such thing as a "typical" republican or democrat. And "right" and "left" are so nebulous, I question the use of quantitative values to describe them at all. Does one's pro-choice view cancel out anti-gay marriage??

The comments to this entry are closed.