In looking at the details of Chapter 1, I neglected to discuss its theme. The part of Freakonomics that appeals to me concerns how data is harnessed to answer interesting questions. Beneath the stories, Chapter 1 is primarily concerned with the collection of data, rather than the analysis of data. Indeed, what count for analysis consist of a few sample averages (e.g. how much the "typical prostitute" earns?) and a few subgroup comparisons (e.g. the relative costs of different sex acts).
***
Turning now to Chapter 2 (the "terrorism" chapter). I find the material here much richer for the statistically-minded reader, and well worth my time.
The chapter has a tri-partite structure: the first section deals with a dazzling assortment of statistical factoids, in a presentation that will either infuriate or engage the statistician, as I will explain below; the second section looks at how ER doctors can be compared even though the assignment of patients is not randomly determined; and the third section describes how one British mystery person uses bank data to find suspected terrorists.
***
As I indicated, Chapter 2 Part 1 (pp. 57-62) will either infuriate or engage you. A large variety of statistical factoids are examined, from which I list three representatives:
- Pregnant Muslim women who took part in Ramadan fasting had babies who grew up to have higher incidence of visual, hearing, or learning disabilities. As a result, certain cohorts (by birthdays) of Muslims have disproportionately higher incidence of such disabilities.
- Coaches of youth soccer leagues in Europe picks the oldest children within each age group delimited by a cutoff birthday of December 31, causing the birth month distribution of players to skew towards Jan, Feb and March (as opposed to Oct, Nov and Dec)
- The practice of listing co-authors in alphabetical order on economics journal articles means economists with last names starting with "A" have a greater chance of winning the Nobel prize.
If this were a statistics book, the author will use these examples to illustrate the notion of "spurious correlations". Within the Muslim community, there is a correlation between certain birthdays and higher incidence of disabilities. However, this is a spurious correlation because the day of birth does not cause disabilities; what is happening is that those birthdays are correlated with fasting mothers, and fasting causes some babies to grow up with disabilities.
L&D take a different approach; they play up the correlations for effect. They say things like "it is no exaggeration to say that a person's entire life can be greatly influenced by the fluke of his or her birth." (p.58) In the case of soccer leagues, they say "birth timing may push a marginal child over the edge." (p.62).
For this situation, I would stress that birthday is a useful indicator of a child's likelihood to make the league but it is not a cause. The reason why the birth month distribution is skewed is that the kids born in Jan, Feb or March are older and stronger than those born in Oct, Nov or Dec and therefore are more likely to earn the coach's favor.
The discussion of economics Nobelists is still stranger. L&D cite the researchers' conclusion that "one of us is currently contemplating dropping the first letter of her surname", adding that the "offending" name was Yariv. Why would any economist want to change his or her name to begin with "A"? The only reason I know is the belief that having a last name beginning with "A" causes one to have a greater chance to win a Nobel.
It is clear that L&D knows the difference between causation and correlation so I think this is an attempt to make the material interesting. By using this presentation, it forces me to delve into what's a cause and what's not; therefore, I find it engaging. Others may find it infuriating.
***
Other thoughts on Part 1:
p.59 -- If the women who survived the Spanish flu pandemic then suffered "terrible luck" "over their whole lives", are L&D saying it would have been better for them to have died from the flu?
p.61 -- I'm not sure how this sentence escaped Levitt's attention; this is an egregious error:
Most youth [baseball] leagues in the U.S. have a July 31 cutoff date. A U.S.-born boy is roughly 50 percent more likely to make the majors if he is born in August instead of July. Unless you are a big, big believer in astrology, it is hard to argue that someone is 50 percent better at hitting a big-league curveball simply because he is a Leo rather than a Cancer.
Likelihood to make the majors is not the same as likelihood to hit a big-league curveball! Indeed, in such a competitive field, the difference in batting averages between a kid who makes the majors and one who narrowly misses out is likely to be a matter of hundredths or even thousandths. While on average, the August class may have a 50 percent higher likelihood of making the majors, the batting average of the August class is extremely unlikely to be 50 percent higher than that of the July class.
(The last sentence also shows that they realize date of birth is not a cause. That's why I think the presentation style is deliberate.)
p.62 -- In a reference to the above baseball example, L&D make the side comment that in determining a boy's chance of making the majors, other factors may be "infinitely more important than timing an August delivery date". Are they thinking about the birthday as a cause or a correlation? I can't decide. (Trying to time the delivery would correspond to believing that being born a Leo rather than a Cancer would help, which seems to contradict the bit on p.61.)
p.61-2 -- On p.61, they talk enthusiastically about Anders Ericsson who argues that stars are made, not born. L&D even wrote an article called "A Star is Made". On p.62, they disclose two almighty factors that are much more important than "birth effects" for being able to play in the majors: being born a male, and having a father who played in MLB. But aren't both those factors born, not made?
p.62 -- They end with this assertion: "So if your son doesn't make the majors, you have no one to blame but yourself; you should have practiced harder when you were a kid." I learn a couple of things from this: (1) their readers are men; (2) training harder causes me to have a higher chance of making the majors, which causes my son to have a higher chance of making the majors.
***
Will write about the rest of Chapter 2 in a future post.
The link may be tenuous, but I think if:
birth month -> age/size
and
age/size -> success in sports
then
birth month -> age/size -> success in sports
and thus,
birth month -> success in sports
can still be considered causal.
The birth month vs. mother fasting vs. child disability is perhaps less well-defined, but the strong correlation at least makes birth month a good indicator of disability.
Posted by: Jon Peltier | 02/27/2010 at 10:40 AM
Perhaps the lack of flexibility in football is a little disturbing the development and progress of the youth, but for example in soccer, you can see young players - even aged 15 and 16 that come and play the major team.
It's all a function of ability, maturity and investment and not really of the age and birth year.
On the other hand - this is the real risk and a measure of taking a change with the "Hot Prospect" who can fail by all means by mistake.
Posted by: Messi | 05/22/2010 at 02:25 PM