« February 2009 | Main | April 2009 »

Colorful maps

Bernard L. loved the recent NYT take on immigration in America.

The very pretty maps are found here.


An amazing amount of data is being visualized here.  Mousing on the map will pick up the specific data for each county.  There is a bar up top for discovering the evolution over time.  It would be great if there is an animation button so the map can be played out without clicking.  An animated gif will also do (similar to the disease map we featured some time ago).

Nyt_imimigrants_scale The colors on the first map represent the origin of the top ethnic group in each county.  Within each group, the tint of the color further displays the percentage of the population that group accounts for.  The subgroups appear to be 0-2%, 2-5%, over 5%.  The last subgroup is very wide.

Not so keen on the second map with all those bubbles.  They show the number of people from each country by county.  The bubble size is proportional to population.  Every version of this map looks the same because the population is concentrated in the cities and the interior is sparsely populated, no matter what ethnic group.

Regardless, this is another laudable effort by the crew at the Times.

Reference: "Immigration Explorer", New York Times, March 10 2009.

Disseminating junk

Tuesday was an up day in the stock markets.  What did the headline writers think?

Wall Street resumes rally following housing report  (Associated Press; link)

What did this great housing report say?  Here was a version from Marketwatch, with my comments.

Housing starts surge 22% on apartment building

It was the largest percentage gain in 19 years and was the first increase in eight months in the sector that was at ground zero in the global economic recession.

At the minimum, they claimed to see a local maximum, perhaps even a global maximum

The housing data in winter months are especially volatile because of the weather.

Was someone hedging?  It was unclear at this point how important this piece of information was.  Some professionals expressed skepticism that the news was really positive:

"We're inclined to write this off as a weather-related fluke for now, " wrote economists for Wrightson ICAP.

Notice they hedged too, using the words "inclined to".  Perhaps they had a reason to doubt the news.  The article now cited less rosy numbers:

But despite February's gain, housing starts are down 47% from a year ago, and are down 74% from the peak in early 2006. Permits are down 44% in the past year.

And even more bad news:

The National Association of Home Builders reported Monday that its sentiment index was stuck at 9 on a scale of 1 to 100 in March.

Did that really say 9 out of 100?  Sounded really bad for a nation of optimists and grade inflators.  One had to start wondering about the headline.  Was that 22% growth a fluke?  Did it mean anything?

Now, two-thirds of the way down, the article revealed its essence:

The government cautions that its monthly housing data are volatile and subject to large sampling and other statistical errors. In most months, the government can't be sure whether starts increased or decreased. In February for instance, the standard error for starts was plus or minus 13.8%. Large revisions are common.

The key news about the sampling variability was a throw-away line ("for instance").

Here, the author was confused about standard error and margin of error.  Standard error, being a standard deviation, cannot be a negative number.  The range of 22% plus or minus 13.8% is a margin of error.  This is a gigantic range, between minus 8.2% to 35.8%.  While we conclude that the growth was statistically larger than zero, we have to wonder whether this number has historical significance (best in 19 years?)

Worse than that, in any month when the growth is less than 13.8%, this level of sampling error means the growth rate is not statistically different from zero.  Thus, the most newsworthy sentence of the entire piece was:

In most months, the government can't be sure whether starts increased or decreased.

In other words, this statistic of growth in housing starts is junk.  If we really want to examine this statistic, the survey needs to use a much larger sample.

MarketWatch is to be commended for noticing the volatility issue.  Of the dozens of articles that came up in a Google search about the housing starts data, none others mentioned this problem.

The original government press release is here (pdf), with all the fine print.

The trouble with maps

Todd B. pointed us here.  These are maps that supposedly show the distribution of respondents for each answer choice in a survey exploring accents in different parts of the country.  The full set of maps for every question can be found here.  


Amusingly, the researchers also provided a map of "all respondents".  (I won't ask how the proportions of respondents were reduced to binary output to produce the above maps.)

Here is Todd to lead off the discussion:

Just because you put data on a map doesn't make it effective. Check out these mapped responses that either tell us that there is no difference in dialects or fails to illustrate differences effectively.

Reference: "Dialect Survey", University of Wisconsin-Madison.


From Bernard L., another exemplary effort by the Times.  This one really got me excited.  


The set of line graphs shows how demographics of students in American schools have evolved in the last two decades.  Here, I selected New York City schools, and the tool sensibly decided to compare those with New York State schools (gray line).

There is so much to learn from one simple chart:
  • The blue and gray lines are almost parallel everywhere, which tells us that in terms of the change in demographic composition, New York City pretty much resembled New York State during this entire period.
  • However, in terms of demographic composition, rather than the change in composition, New York City schools are very different from the rest of the state, in that the proportion of white is lower by a third while that of minorities are much higher, especially black and Hispanic students.
  • State-wide (as well as city-wide), black and white students have been declining as a proportion while Hispanics and Asians have increased. 
  • The extent of the change is immediately visible, Asians have jumped from 7% to 14% for example.  
From a graph design perspective, the execution is very clean.  Data labels are limited to the first and last values.  A small multiples concept is used with the ethnic groups placed side by side.  A great awareness of foreground and background as well.  And imagine how much data has been visualized here, and be impressed.  You can look at any county in the country.

Here's one where the county change does not exactly mirror the state change (Napa in California):


Reference: "Diversity in the classroom", New York Times, March 12 2009.

Deep dive

The chart on the right which compares the unemployment picture across past recessions has many good features.


It uses a very sensible metric, counting the percentage change from peak employment.  I have mentioned the superiority of this type of presentation compared to plotting a time series of unemployment rates.  The following is an example of a standard graphic using gray bands to indicate recessions.  The difference is obvious.


Here is the previous post that dealt with the drop in market capitalization of banks since the peak which screamed out for this type of treatment.

It handles the foreground-background issue very well.  A number of similar charts is circulating in which every single line has a different color.  Here, the designer clearly tells us the current recession is the foreground and all the past recessions form the background.  It looks as if the 1981-3 recession is slightly highlighted with a darker orange to draw attention to the fact that it is most similar to the current situation.  I find this unnecessary because the association is clear even without the darker hue; however, the designer does this with a very light touch so this is just a question of taste.

I would label the horizontal axis starting at 0. 

Reference: "Job Losses Hint at Vast Remaking of Economy", New York Times, Mar 8 2009.