« May 2009 | Main | July 2009 »

Round up

Here are some interesting reading from other places:

Blog_foodtag Tag clouds have caught on since we approved them a while ago.  One interesting use was at the Life Vicarious blog.  They use it to compare the inclinations of three New York-based restaurant reviewers.  What they should have done is to remove irrelevant words like  "one", "also", "many", "make"/"made", etc.  In statistics, this is called removing "noise" which helps bring out the "signal".

Nyt_babyimbalance Andrew Gelman discussed the NYT article that reported the finding of unexpected male bias in the children of Asian American families.  He can be counted on to make useful comments on any accompanying graphics.  He rightly pointed out that this is one example of not starting at zero: the relevant baseline is 100 since the metric is essentially the over-age of males relative to females.  I also agree that a line chart with a longer time series plotting percentages rather than over-age would work better.

Fd_calorie The racetrack chart made an appearance at Flowing Data.  This one is even more busy and just as impossible to decipher.

Reading comprehension

Note: I am in the middle of a holiday and so posting will be limited.

Andrew posted a pretty chart that caught my attention.  This is the sort of sophisticated chart that rewards careful reading. 


Below is a guide to reading the chart:

  • It is a small multiples chart with the components arranged in two dimensions (income levels, and a race-religion hybrid category).  The top row is a summary of voters of all race-religion grouped by income.  Note that there is no corresponding summary column for voters of all incomes grouped by race-religion.
  • Source of data: 2000 poll but applied to 2008 demographic patterns.  In other words, there is an underlying assumption that opinions have stayed stable within the demographic groups.
  • The chart is in fact three dimensional because each map gives us the geographical (state by state) breakdown.
  • It is useful to figure out the smallest unit of data: in this case, this is the percentage support of federal school vouchers by voters of a given race-religion-income-and-state category.
  • The color scheme is such that red represents highest support and blue lowest support, with pink and purple in the middle
  • It's almost always better to start from the aggregate (that is, the average) and then study variations along different dimensions, and this is how the chart is arranged from top to bottom
  • On the top row, the higher income groups tended to favor vouchers more than lower income groups, with a break point around $75k; even here, the regional differences are significant, with northeast and southwest hotter for vouchers at all income levels
  • As we move from row to row, we realize that the aggregate data hides many disparities.  For example, white Catholics (second row) are more likely to support vouchers regardless of income level while white non-evangelical Protestants (fourth row) are much less likely than average to support vouchers at all income levels.
  • Notice that the statistician (Andrew) has carefully defined the race-religion categories to balance between collapsing subgroups that are distinct and showing too many subgroups so as to cloud the patterns.  That is why there are many more race-religion subgroups that are not shown.  The ones shown are of special interest.  Consider the white protestants, evangelical vs. non-evangelical (third and fourth rows).  If one were to fix the race, geography and income dimensions, and even fix half of the religion dimension, we still find the two subgroups to be on different ends of the spectrum relating to the voucher issue.  This is why the evangelical or not dimension has been included.
  • The white space is interesting.  Here, the issue faced by the statistician is sparse data when one gets down to multi-dimensional subgroups.  Andrew chose to ignore all the data, which is the wise thing to do.  With so few samples, it is particularly easy to draw bad conclusions.   
  • Because of the white space, we get additional information on the spatial distribution of the demographic subgroups.  The black population (at least the voters) are predominantly found in the southeast while Hispanics are in the southwest.  The subgroup of income higher than $150k is essentially all white.  Admittedly, this is a very crude read because we only have two levels (below 2% of state population and above).  Of the colored states, we cannot differentiate between densely populated and not.


Such rich graphics deserve careful reading.  Enjoy!

Spinning the climate

Mike L. pointed us to this pair of "climate change model pie charts", with the brief comment "Yuck".

What they are doing is to use the spinning wheel analogy to present probabilities (odds).  Not a good use of pies either.  Histograms do the job with minimal fuss:


I collapsed the 2-2.5 and 2.5-3 degrees sectors since every other one is a one-degree interval.  We see immediately that the effect of the policy is to shift the probability distribution to changes of fewer degrees.

Reference: "Climate change odds much worse than thought", Science Daily, May 20 2009.

The case of the shrinking mall

I had a similar reaction when reading this chart but I will let our reader take center stage.

Top portion

Bottom portion

The back of today's New York Times' Week in Review section devotes half of its space to a lame infographic that wastes space and has a major error (which has been corrected lazily in the version now online at http://www.nytimes.com/interactive/2008/10/14/opinion/20090531_OPCHART.html ) The chart shows the recent decline (or in some cases, rise) of retail sales at 27 common mall chains. My main objection is that the top half of the chart is useless. Yes, it provides a baseline from which shrinkage in area (visual metaphor of income = floor space) in the bottom half can represent the relative declines in sales, but this is redundantly handled better by color. The only part of the chart that I got any information from was the bottom half, and it took me a while before I figured out why the top half was even there. Meanwhile, the error in the printed version is that the +5-10% stores were colored light green while the +0-5% store (Burger King) was colored dark green. It should have been vice-versa. The online version simply swapped the colors in the legend, rather than on the map itself, which works logically but begs the question: why do you have dark red at one end of the spectrum and light green at the other, with dark green in the middle? Thank you for listening-- just blowing off a little steam here. It's a lot of wasted space and I'll bet the New York Times paid a lot for it.

Reference: "Op-Chart: The Fall of the Mall", New York Times. (Ed: I am not sure why the date is given as October 2008.)