« October 2007 | Main | December 2007 »

Digging deeper

Two items from other places caught my eye this week as they directly relate to some things we discussed on this blog.

First, I second Andrew's suggestion of a recent NYT article for teaching the concept of margin of error, or how to read political poll coverage intelligently.  Towards the end of this piece is a small gem:

Some pundits began by saying the horse race numbers were close but then tried to marshal evidence that they were not. On ABC's own Web site, Chris Cillizza, wrote: "Among women in the Post poll, Obama actually leads Clinton 32 percent to 31 percent among women. Voters 45 years of age or older are similarly divided, choosing Clinton by a 27 percent to 26 percent margin over Obama. Ditto for those who earn $50,000 or less a year; 29 percent for Clinton, 29 percent for Obama."

Mr. Cillizza failed to mention that if the margin of sampling error is plus or minus five percentage points for all of the likely Democratic caucus goers, then it is even higher for subgroups like women.

In a recent post, I call this the "oft-used device of subgroup support of a hypothesis".  This example illustrates the fallacy more clearly.  It's the "let dig deeper since we haven't found the gold yet" phenomenon.  Such analysis suffers from two serious statistical problems.  The article deals with the sample size problem: the margin of error at the subgroup level is by definition larger; what this means is the bar for statistical significance has been raised; and rare is the case where such analysis could lead to any further insights.  (Of course, I am assuming the original poll was not designed to be analyzed at the subgroup level.)

The other issue -- more difficult to explain and omitted in the article -- is the multiple hypothesis problem.  It is well known that if we dig around long enough, we may get so dizzy that anything that glitters will look like gold.  In other words, false positives.  Like the sample size problem, the remedy is to raise the bar for statistical significance even higher.  In practice, this frequently wipes out the rationale for such analysis.

I will address the other interesting item in a new post.


The punch line

Mike K submitted this great entry months ago.  It's a map depicting stock market correction across the globe during the summer.  You have to click on the link to the WSJ website in order to see the interactive element.

Wsj_correction

Here are Mike's comments and mine:
Why it's bad:
First, to see the detail you have to click on the countries one by one. Hard to do a comparison of two countries. This makes it close to FlashJunk.

The color scheme is supposed to help but:

Second, the colors are too close together to allow easy comparisons of, say, Canada and Australia.

In addition, the binning of the colors is uneven and oddly chosen.  In the middle of the scale, each color shift represents 1% but at the edge, it is 5%, or more.

Third, area of these countries, or their geographic location, isn't really that relevant. Market cap might be. Then tiny-but-richly-capitalistic Netherlands wouldn't have to be shown in the middle of the Atlantic, as if the dikes had all burst and Amsterdam had floated out to sea.
Indeed, it begs the question: what were the gold dots suppose to signify? (Hint: it's not location.)
Fourth, why the selectivity? There's stock markets in Turkey, and in Russia, and in Ireland and in Thailand. (Oh, wait, they show the one in Thailand -- except they put it in Myanmar instead.)

Finally, the chart lacks a punch line. 

In the junkart version, I want to test the hypothesis of a global contagion so I plot the data in order of closing times of individual stock markets.  (I just guessed the closing times based on the map.)  Not much here though.

Redo_correction

Source: "Global Correction", Wall Street Journal, August 2007.


A dangerous equation

Graduation rates at 47 new small public high schools that have opened since 2002 are substantially higher than the citywide average, an indication that the Bloomberg administration’s decision to break up many large failing high schools has achieved some early success.

Most of the schools have made considerable advances over the low-performing large high schools they replaced. Eight schools out of the 47 small schools graduated more than 90 percent of their students.

Nyt_smallsch This graphic included in the NYT article  lent support to the "small schools movement".  In particular, note the last sentence of the above quotation: it incorporates the oft-used device of subgroup support of a hypothesis, in this case, the subgroup of eight top-performing schools.

Such analysis is "dangerous", according to Howard Wainer, who discusses this and other examples of misapplication in a recent article in American Scientist, entitled "The Most Dangerous Equation".  He alleged that billions have been wasted in the pursuit of small schools.

The issue concerns sample size.  Dr. Wainer and associates analyzed math scores from Pennsylvania public schools.  Wainer_mathscoresAverage scores for smaller schools are based on smaller number of students, and therefore less stable (more variable).  More variability means more extremes.  Thus, by chance alone, we expect to find more smaller schools among the top performers.  Similarly, by chance alone, we also expect to find more smaller schools among the worst performers. 

The scatter plot lays out their argument. Focusing only on the top performers (blue dots), one might conclude that smaller schools do better.  However, when the bottom performers (green) are also considered, the story no longer holds.  Indeed, the regression line is essentially flat, indicating that scores are not correlated with school size.

This is all nicely explained via the standard error formula (De Moivre's equation) in Dr. Wainer's article.  Here is a NYT article from the mid 1990s describing this same phenomenon.

File this as another comparability problem.  Because estimates based on smaller samples are less reliable, one must take extra care when comparing small samples to large samples.

Dr. Wainer is publishing a new book next year, called "The Second Watch: navigating the uncertain world".  I'm eagerly looking forward to it.  His previous books, such as Graphic Discovery and Visual Revelations, both part of the Junk Charts collection.

Sources: "The Most Dangerous Equation", American Scientist, November 2007; "Small Schools Are Ahead in Graduation", New York Times, June 30 2007.


P.S. Referring back to the NYT chart above, one might wonder at the impossible feat of raising graduation rates across the board simply by breaking up large schools into smaller ones.  This topic was taken up here, here and here.  When evaluating the "small schools" policy, it is a mistake to discuss only the performance of small schools; any responsible analysis must look at improvement over all schools.  Otherwise, it's a simple matter of letting small schools skim off the cream from larger schools.

 


Social networking

I've been meaning to write about our "larger blog family", or social network, for a while.  It's taken time since this requires a bit of digging around.  These sites share significant traffic with us, which means readers like you also like these sites:

Statistical Modeling, Causal Inference, and Social Science
aka, the Gelman blog.  Active community of statisticians, with regular commentary on visualization and graphics, and occasional diversions into unexpected topics (spam! art!).  Required reading.

Information Aesthetics
Lovely blog focusing on graphics that are pleasing to the eye.  They like entertaining; we like informing.  Nice complement or counterpoint to our point of view.

Jason Kottke
"kottke.org is a weblog about the liberal arts 2.0 edited by Jason Kottke".  Eclectic. 

Process Trends
D Kelly O'Day's collection of chart tips and links to on-line resources.

Juice Analytics
Website/blog run by a data analysis consulting firm.  Very active posting.  Good on tools.  They even awarded us the Juicy Award for "charts and graphs": a belated thank you!

Edward Tufte
The meeting point of Tufte fans.  Tufte himself also joins in the forums.  For the very serious.

Mahalanobis
Before this blog shut down due to employer interference,  two hedge fund guys shared their wit
here.

EagerEyes
Robert Kosara's blog on all things visual.  We featured his scribble maps here.

Statistical Graphics and Data Visualization
This blog suffered twice.  First, a certain StatGraphics company forced them to abandon their original URL.  Then, the blog activity withered.

Malaprensa
Josu's Spanish blog ("Bad Press") picking out factual errors in the Spanish press.  (Thanks to Josu and Jorge for correcting my misattribution.)

The next batch includes:

Social Science Statistics
Data Mining: Text Mining, Visualization and Social Media
Science Magazine
Design*Notes
L'economie sans tabou
R Project Wiki

The problem is it's hard to keep up because a lot of other sites are showing up in the recent history.  But then most of you come directly to the site, or through Google, or through an RSS reader, or del.i.cious and so on.


The absolutely meaningless pie chart

Simon J., from New Zealand, sent this in during the recent Rugby Cup but I didn't notice it till now.  As he stated, "they do a good job confirming our views of pie charts!"  Dropkicks is a site about rugby, and other sports popular in the south Pacific.

So here is our light entertainment for Thanksgiving week:
Dropkicks_pie_chart


This chart accompanied a very serious statistical analysis to address the monumental question of whether some countries were borrowing strength from foreign players.  If this is your cup of tea, follow this link.

P.S. Today I started the Junk Charts Core Collection, which include books I recommend on graphics, statistics, data mining and related topics (top right).  Some categories are sparse right now as I build out the collection.  If you have favorites, let me know and I will include them.  (I am using the Amazon interface to organize the list; if you buy books, you are buying from them.  I am not becoming a bookstore.)

11/19: Amazon seems to be having problems serving up the images.  I have turned off the image for now.  You can follow the text link above to see the book collection.

11/20: the image is up again


Wordsmiths

I know there are more than a few wordsmiths among our readers and this entry is for you.

Data is a troublesome word.  It is the plural of datum.  And yet, I find it unnatural to say "here are my data" instead of "here is my data".  Similarly, "datum point" is more grammatical than "data point" -- and perhaps both are redundant words -- because we should use a singular noun to modify another noun (e.g. company ranking, not companies ranking; potato chip, not potatoes chip; etc.)


In a recent piece on RSS News, Ian Schagen convinces me to treat data as singular, without remorse.  Among his many arguments, he points out that agenda is the plural of agendum and yet we have no qualms using agenda as singular.  So from now on, data is singular.

Ian has the last word:

In fact an amusing pastime is to read papers and articles by people, trying to use 'data' as a plural because they believe this to be 'correct', who slip into the proper English usage when their attention wavers.  I've often seen both usages in a single sentence!

Source: "Why 'data' is singular", RSS News.  (Unfortunately, this seems to be only available to members in paper copies.)


Large tables

PrivacyRichard J. asked how we might make sense of this tableLarge tables present lots of challenges.  The trick is to enhance the table with colors and shapes; and as usual, remove any data that doesn't help make your argument.

This table compares countries across different measures of privacy.  Each measure is rated on a scale of 1 to 5, with some blanks.  These ratings are averaged to obtain an overall rating, listed on the right.

In the junkart version, the ratings are presented as slots inside a box.   The overall rating is placed right below the name of the country since this is the most important measure, and how the countries were ordered.  The rows and columns are reversed so as to explain how the overall rating can be decomposed into individual metrics for each country.  I have only shown the top five countries but obviously the chart can be extended to cover all the data. 

Redo_privacy

If desired, the top 5 countries in each measure can be given a different color: this would increase the data-ink ratio on the chart.  One weakness of this type of chart is that the rows and columns do not have equal status: comparing across rows is more difficult than comparing up and down columns.

Richard also wonders about their treatment of the blanks.  It appears that they omit blanks so each country's rank is the average of non-blank measures.  Omitting blanks may seem innocuous but in fact, this is equivalent to assigning the blank measures ratings equal to the country's average non-blank rank.  Richard wonders if this is the best way to treat these blanks.

 

Source: "Leading surveillance societies", Privacy International.

(Thanks to Richard for sending me the data.)


New York Times: a tribute

As many of you realize, this blog owes a lot to the New York Times.  The Times is unique in its willingness to print interesting, sophisticated graphics.  Via the Social Science Statistics blog, I found out that Matthew Ericson is a deputy graphics editor, and he recently gave a gigantic presentation at the IEEE InfoVis conference. (You can download the entire document from his website.)

Nyt_houseshiftAs the SSS blog pointed out, the section on how they decided to visualize the shift in party margins by House districts, specifically to declare scatter plots as too "difficult for the masses", is fascinating.  It illustrates the idea of sketching that I have advocated here in the past. (The PDF of the complete graphic can be downloaded from here.)

From my point of view, the issue is less the type of chart than the level of aggregation.  The chart has a very appealing data-to-ink ratio (a la Tufte) but could less be more?  One of the secrets of making a good chart, and any data analysis for that matter, is to reduce complexity.  For example, is it crucial for every single district to receive equal treatment?  (Similarly, if scatter plots were chosen, is it crucial to include every district?)

*********************************************************

Nyt_bondsetal_2 Several examples of great charts can be found in Matt's presentation.  On slide 83, I admire the Bonds/Aaron/Ruth chart.  The inset showing the acceleration of Bonds from age 35 to 39, as compared to the decline of Aaron and Ruth during the same age span, is powerful.  Similarly, the effective use of foreground (blue) and background (gray) in comparing ARod, Pujols and Griffey against the big 3 is masterly (see right).

There is also a sequence on mapping the San Diego wildfires (slides 2-10), showing how they gathered population data to complement fire data, thus adding context to the threat to highly populated regions.

******************************************

On a different vein, the SSS blog, written by the people at Harvard's Institute for Quantitative Social Sciences, has written a number of engaging posts on data graphics recently.  Take a note at Visualizing Electoral Data, which coincidentally addresses a similar issue as the NYT party vote share graphic discussed above.

Sss_partisanswing This graphic plots the degree of party swings by UK parliamentary constituency.  The darker the color, the tighter the stranglehold by one party.  Going from top to bottom, the authors show party swings over successive elections.  The swing constituencies are therefore near the middle of the chart. 

 


Red-lining by marriage

Bbc_family Tom W., a reader, noticed this map featured on a BBC News page about the UK family.

One can roughly make out the shape of Great Britain so this is some kind of cartogram.
The title announces that this cartogram concerns the "distribution of population". 

In a typical map like this, the redder reds would indicate higher densities of people.  Yet, the article tells us that the population is divided evenly into 85 squares, each containing
"roughly half a million people over 18 years old".

Instead, we seem to have 500K widowed people next to 500K re-married people (most of whom prefer the coasts, by the way), etc.  Apparently, the Brits practise a form of red-lining based on marital status!

The S/M/W/D/R labels are also redundant and very distracting; and the white gridlines interfere with our ability to read the grey boundaries.

Source: "The UK family", BBC News.


The eyeball test

This set of graphs was used by the NYT to discuss changes in U.S.  spending patterns over time.  For this post, I am focusing on the bottom left and bottom right graphs.  One shows spending on energy as a percent of GDP; the other, on "nonresidential structures" (aka, commercial buildings).

Nyt_spending

At first glance, spending on energy and that on commercial buildings look very similar in shape (see above or below left).  Alas, this "eyeball test" doesn't work very well with time series data.  Lets investigate further.

Redospend1_2

"Standardizing" the data (above right) tells us whether the swings are unusual or not in the history of the data.  So in the 1980s, commerical building spend spiked to more than three times the standard deviation above the historical average.  Generally speaking, the standardized unit of 3 is taken to mean highly unusual. 

Notice that the peaks of the left graph had equal heights but on the right graph, energy spending peaked only above two while commerical building spend rose above three.  This is because energy spending has been more volatile historically so it takes larger jumps (or plunges) to count as "unusual" movements.  This information is hidden in the unstandardized version.

Further, since we are concerned with long-term trends, lets take a look at five-year moving averages (below right): in other words, each time point is the average of the preceding five years worth of data. 

Redospend2

The fluctuations have been smoothed out and the peaks are no longer as high.  Glancing at this chart, we may still conclude that the spending patterns are quite similar -- especially in the period prior to 1995.

But is that really the case?  Zooming in on the 1980s, we may mistakenly think the two lines are "close together" if our eyes read the horizontal distance and/or area between the curves, rather than focusing on the vertical distance.  The arrows on the bottom left chart depict this difference.  To make things clearer, the bottom right chart plots the vertical distances between the two lines.

Redospend3

Observe that the difference expanded to above 1 unit in the late 1980s.  A difference of one unit is very large in the standardized scale (of "unusualness") since 0 is business as usual and 3 is "highly unusual".

Eyeballing the two time series would lead us to believe that the two series are similar but we run the risk of underestimating the differences as illustrated here.


Source: "Auto Sector's role Dwindles, and Spending Suffers", New York Times, Nov 3 2007.