From Bernard L., another exemplary effort by the Times. This one really got me excited.
The set of line graphs shows how demographics of students in American schools have evolved in the last two decades. Here, I selected New York City schools, and the tool sensibly decided to compare those with New York State schools (gray line).
There is so much to learn from one simple chart:
The blue and gray lines are almost parallel everywhere, which tells us that in terms of the change in demographic composition, New York City pretty much resembled New York State during this entire period.
However, in terms of demographic composition, rather than the change in composition, New York City schools are very different from the rest of the state, in that the proportion of white is lower by a third while that of minorities are much higher, especially black and Hispanic students.
State-wide (as well as city-wide), black and white students have been declining as a proportion while Hispanics and Asians have increased.
The extent of the change is immediately visible, Asians have jumped from 7% to 14% for example.
From a graph design perspective, the execution is very clean. Data labels are limited to the first and last values. A small multiples concept is used with the ethnic groups placed side by side. A great awareness of foreground and background as well. And imagine how much data has been visualized here, and be impressed. You can look at any county in the country.
Here's one where the county change does not exactly mirror the state change (Napa in California):
Reference: "Diversity in the classroom", New York Times, March 12 2009.
The chart on the right which compares the unemployment picture across past recessions has many good features.
It uses a very sensible metric, counting the percentage change from peak employment. I have mentioned the superiority of this type of presentation compared to plotting a time series of unemployment rates. The following is an example of a standard graphic using gray bands to indicate recessions. The difference is obvious.
Here is the previous post that dealt with the drop in market capitalization of banks since the peak which screamed out for this type of treatment.
It handles the foreground-background issue very well. A number of similar charts is circulating in which every single line has a different color. Here, the designer clearly tells us the current recession is the foreground and all the past recessions form the background. It looks as if the 1981-3 recession is slightly highlighted with a darker orange to draw attention to the fact that it is most similar to the current situation. I find this unnecessary because the association is clear even without the darker hue; however, the designer does this with a very light touch so this is just a question of taste.
Bernard L pointed us to this income distribution chart printed in the Economist.
The accompanying paragraph points to the range of the bars, that is, the gap between the top decile average and the bottom decile average, as evidence of income disparity, concluding that the US and Britain are among the worst.
Bernard likes the use of vertical sections to represent the average incomes by decile and dislikes the USA-Today style background image. Agreed. But why plot the middle deciles at all when the only worthy data involve the endpoints of the bars?
A close examination of the spacing of the middle deciles leads to more befuddlement. There does not appear to be much difference between the countries.
The answer to this is that decile statistics are not appropriate for data as skewed as incomes. At the high end, the 10% intervals are too coarse.
One clue to this is that the top 10% in the US only earns $90,000 on average but we have all heard of the billion-dollar hedge fund managers and Wall Street bankers and $30 million a movie celebrities. The problem is that within the top decile, the income distribution is also tremendously skewed.
The neat idea of plotting the vertical sections indicates an awareness that the red dots (average income) are insufficient because of the skew. Alas, there remains a lot of skew above the top decile and the designer inadvertently falls back into the same trap by considering the average income within the top 10%. Thus, the amount of disparity on the right side of the chart is grossly underestimated. Roughly speaking, we are looking at 10 samples of the distribution, nine of which at the low end of the range and only one at the top end (long tail). Here is the idea:
Reference: "Spreading the wealth", Economist, Oct 21 2008.
Jens, a long-time reader, tried to re-make the boring data tables used to report poll data. Here is an example from USA Election Polls (left) and his enhanced version (right).
Like Jens, I find most of the tabular presentation of poll data underwhelming. Too much data hiding all the useful information. For example, the pollster and polling date data provide a context for super-serious poll watchers to interpret the data; however, they do not present themselves in a way that actually help readers. Read further for versions that bring out this data much better.
Meanwhile, Jens' revision uses color and ordering to bring out the current state of affairs. The addition of electoral votes allows us to understand the relative weight of each row, countering the weakness of the tabular format, that each row has the same height, implying erroneously that they have the same importance.
There are a number of good web-sites where this type of data is presented in attractive ways.
I have been a fan of Political Arithmetik, which made great use of the pollster and polling date data mentioned above. Those data have been averaged to show the overall trend while the individual poll results are plotted as dots in the background. The polling date data is embedded in the horizontal positions of the dots. Even more impressively, the margins of error are presented. Remarkably, this race has been a statistical tie for all these months, the 95% lower limit never quite making it above the zero level.
Another great site is fivethirtyeight.com. Below, they essentially turned Jen's enhanced table into a map. The legend on the right perhaps represents what they call "East Coast bias"? All of Nathan's graphs are very attractively produced; I just wish he'd put more labels on them (such as the differentials corresponding to shades of red and blue.)
In his column on automated polls versus traditional telephone polls, the Numbers Guy at Wall Street Journal gave us a few entertaining quotes.
"The dog could be answering the questions, " Ann Selzer, a traditional pollster, said of automated polling, which occurs through automated voice messages to voter who record responses. Also, WSJ cited a prominent textbook which labelled them as "Computerized Response Automated
Polls -- insulting acronym intended."
Reader Mark A. brought this to our attention because of the following chart. He wondered what the point of the vertical axis was.
Aside from that cosmetic problem, the biggest issue is the lack of explanation. Predictive power, pollster-introduced error, methodological error: what are these? The article itself gives no clues. To make sense of the chart, readers need to consult Nathan Silver's (excellent) site, fivethirtyeight.com. (The gory details here.)
By the way, Nathan's site has a variety of nicely produced charts. (Like this one, readers will need to dig around to collect background information to interpret some of those charts.)
Another improvement is to provide some sense of the variance in the data, either by showing more than the top five pollsters or by showing the range of errors. Since the average pollster sits on the right edge, it is as if the right half of the chart was clipped. In the version below, we found most polls hovering around the average, with two egregiously bad.
If we know which polls are automated and which aren't, then color the dots accordingly.
There are bench players on every chart: these are the titles, axes, labels, text and so on. They provide background information required to interpret the chart. They may sit in the margins but their value is not to be underestimated.
Don't let the dog eat the marginal information.
Reference: "Press 1 for Obama, 2 for McCain", Wall Street Journal, Aug 1 2008.