« Bell Curves: Not on charts please | Main | The charting process »

May 14, 2006

The nature of variation 1

BirthsbymonthI refer readers to Andrew's comments on a graph purporting to demonstrate the existence of a month-of-year selection bias in the NHL, cited on the Freakonomics blog as an example of "overwhelming" evidence of such effects in sports.  (The original graph may have come from here.)

In particular, note the Professor's point #4.  It is always necessary to ask oneself if perceived "trends" are real or not before attempting to provide an explanation.  What Andrew computed can be interpreted to mean that approximately 30% of the time, we expect to see percentages larger than 9% or smaller than 7%.  Thus, out of 12 months, we'd expect to see about 3.6 months with those "extreme" values (even if players were randomly picked from the population so that their birthdays would have been evenly spread out).  The NHL line contains 4 such values and so while there is some evidence of bias, it is certainly not "overwhelming" as Freakonomics suggested.

The chart itself is, sadly, misleading by its very choice of comparing NHL players to the populations of Canada and USA.  To cite the original website, the key message of this chart was:

The 761 NHL players show a distinctly different pattern than that for Canada or the United States with the highest percentage of births in January and February and the lowest in September and November.

This "pattern" is the larger observed dispersion of NHL monthly percentages from the mean percentage of 8%, as compared to Canada or USA.  In other words, the NHL line fluctuates more wildly. 

Too bad there is a statistical law that guarantees this "pattern": the law says that in looking at sample averages, the larger the sample size, the smaller the dispersion.  (This is why Andrew used the sample size 761/12 in his calculation.)  Because the Canada and USA lines represent averages of millions of people while the NHL line represents only 761 people, it is absolutely no surprise to find the NHL line fluctuating more wildly!

Thus, the comparison is not valid.  It'd have been more useful to have drawn the NHL line for various historical periods.  If all the lines show a downward slope, then it would be time to examine why this is occurring.

To further fix ideas, look at the following set of lines.  Each line represents an alternative universe in which 761 people were randomly selected to be NHL players from the US and Canadian populations.  While in theory the line connecting monthly percentages should be flat (at 1/12 or 8%, i.e. the green lines below), in reality, because of random selection, the lines fluctuate quite a bit.

Bdaylinematrix2

While the amount of dispersion is not "overwhelming", perhaps the observed trend of decreasing percentage with increasing month is unusual enough to warrant further study.  I'll take a closer look next time.

References: Andrew Gelman's blog, Freakonomics blog, Freakonomics NYT column

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341e992c53ef00d83429dfc153ef

Listed below are links to weblogs that reference The nature of variation 1:

» Pretty graph, could be made even prettier from Statistical Modeling, Causal Inference, and Social Science
Here's a pretty graph (from Steven Levitt, who says "found on the web" but I don't know the original source): This is a good one for your stat classes. My only suggestions: 1. Get rid of the dual-colored points. What's... [Read More]

Comments

im statistical and im study in faculity of scince 4th year im looking for information about : p-value with exampls, im asking u for help ,all my thankful for u

im

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31