Many readers will, or have, read SuperFreakonomics. I'm making my way through the book, and keeping a log of my thoughts. Here is how one statistician takes in Chapter 1 (the "sex" chapter).
p.20 -- was surprised to learn that women used to have shorter life expectancy than men. I have always thought women live longer. This factoid is used to show that throughout history, "women have had it rougher than men" but "women have finally overtaken men in life expectancy". I'm immediately intrigued by when this overtaking occurred. L&D do not give a date so I googled "female longevity": first hit said "it appears that women have out survived men at least since the 1500s, when the first reliable mortality data were kept."; the most recent hit cited CDC data which showed that U.S. females outlived males since 1900, the first year of reporting. In the Notes, L&D cite an 1980 article in the journal Speculum, published by the Medieval Academy. In any case, the cross-over probably occurred prior to any systematic collection of data so I find this minor section less than convincing.
p.20 -- L&D tell us "In China,... females are still far more likely than males to be abandoned after birth, to be illiterate, and to commit suicide." How should one interpret such statistics? My hunch is that among countries with similar literacy rates as China, it is probably true that females are more likely to be illiterate than males. If so, is the gap in China significantly larger than in other countries? The UN data is easy to find: overall, male, female adult literacy -- China: 91, 95, 87; Singapore: 93, 97, 87; Malaysia: 89, 92, 85; Phillippines: 93, 93, 93; Thailand: 93, 95, 91; Mexico: 91, 92, 90; Indonesia: 90, 94, 87, etc. In no way is the inequity in adult literacy in China special. The comment on suicides makes more sense as in most countries, men are more likely to kill themselves but it's the reverse in China.
p.21 -- L&D cite "For American women twenty-five and older who hold at least a bachelor's degree and work full-time, the national median income is about $47,000. Similar men, meanwhile, make more than $66,000, a premium of 40 percent." I'm assuming $66,000 is a median income as well. A ratio of two median incomes is not very useful; it tells us nothing about the distributions of the male and female incomes (which are very skewed). A more useful statistic is the percentile of $47,000 in the male income distribution: in other words, the mid-rank female earns less than X% of male counterparts.
p.21 -- They are chatting about causes of the male-female wage gap. "Even within high-paying occupations like medicine and law, women tend to choose specialties that pay less (general practitioner, for instance, or in-house counsel). And there is likely still a good amount of discrimination. This may range from the overt -- denying a woman a promotion purely because she is not a man -- to the insidious." I wish they made the duality of the cause--effect linkage clearer. The first factor claims women selects low-paying jobs while the second factor says high-paying jobs (their hiring managers) selects men. This is a common hiccup in causal inference research: which direction does the arrow of causality point?
p.22 -- They make the argument that Title IX boosted the appeal of coaching jobs for women's sports teams. To prove this, they say only 6 out of 13 WNBA teams had female head coaches as of 2009. For some reason, next they tell us ten years ago, only 3 of 14 WNBA teams had female head coaches. Are they saying the prestige of WNBA coaching jobs has declined in appeal over time? I'm confused.
pp.23-4 -- They cite several statistics of the weekly wages of prostitutes in Chicago, in historical dollars as well as in current dollars. First there was a girl who took in $25 a week in old dollars, and $25,000 a year in current dollars. This girl was described as "at the very low end of what Chicago prostitutes earned". So I'm expecting to learn the higher wages others make. The next sentence reads: "a woman working in a 'dollar house' (some brothels charged as little as 50 cents; others charged $5 or $10) took home an average weekly salary of $70, or the modern equivalent of about $76,000 annually." I just couldn't figure out how the words inside the parentheses relate to the rest of the sentence. A "dollar house" doesn't sound like a place where a lot of money is made.
p.23 -- A study estimated that "1 out of every 110 women in that age range [15-44] was a prostitute". This type of statistic is designed to make us think someone in this restaurant (or train, etc.) is a prostitute. But most often, it is misleading. The number is computed by dividing the number of prostitutes by the number of women. It assumes that every woman has the same chance of being a prostitute which is obviously not true. L&D realize this and add: "1 out of every 50 American women [in their twenties] was a prostitute." This doesn't go far enough. Later, on p.32, they inform us that "prostitution is more geographically concentrated than other criminal activity", which means that the chance that a twentysomething is a prostitute is highly dependent on where she lives.
pp.27-8 -- Has a very nice description of why survey research has many limitations, especially when it comes to asking questions about sensitive subjects, like sex, stealing, racism and so on. A precautionary tale for reading polling and market research data.
pp.28-9 -- Pondering how, and why, Venkatesh's method is better. Are former prostitutes more likely to elicit the truth about prostitution than others? If one wants to learn about male chauvinism, would male workers be more likely to get to the truth than female workers? (It's unclear if the former prostitutes were paid; they use the word "hired". The prostitutes being studied were paid.) This highlights the importance of understanding the motivations (and resulting biases) of data collectors. The bias introduced by paying participants is well known in the survey arena but tolerated in order to have an acceptable response rate.
p.29 -- They cite statistics about "the typical prostitute in Chicago." In what ways are the subjects of the study "typical" and in what ways are they not typical? The sample size was 160. They don't say much about the selection process of the subjects, except that they all came from three South Side neighborhoods. Would like to know more about the selection.
p.29 -- "At least 3 of the 160 prostitutes who participated died during the course of the study." Don't use the phrase "at least"! It sounds sloppy, and it is sloppy as "at least 3" includes "everyone". This is a documented study with a small sample; they should know exactly how many died.
p.30 -- After much buildup, we get to their surprise: "Why has the prostitute's wage fallen so far?" I'm looking for the data, what does it mean by "so far"? All we have is the assertion "the women's wage premium pales in comparison to the one enjoyed by even the low-rent prostitutes from a hundred years ago." On the previous page, we learn that modern "street prostitutes" earn $350 per week. On p.24, we learn that in the past, Chicago prostitutes took in $25 a week, "the modern equivalent of more than $25,000 a year". Unfortunately, neither of these two numbers is comparable to $350. Dividing $25,000 by 50 weeks (approx.) gives $500 per week. So the drop is $150 off $500, or 30%. But... this is a comparison of wages from prostitution, not of "wage premium". On p.29, the modern study found "prostitution paid about four times more than [non-prostitution] jobs." On p.23, they say "a tempted girl who receives only $6 per week working with her hands sells her body for $25 per week" so we can compute the historical ratio as $25/$6 = 4.17 times. So, I must have gotten the wrong data.
pp.30-31 -- some interesting comparison stating that only 5 percent of men today lose their virginity to a prostitute but 20 percent for those born in the 30s. Just be reminded of their earlier warning about truthfulness in research studies involving sensitive topics.
p.32 -- They assert "prostitution is more geographically concentrated than other criminal activity: nearly half of all Chicago prostitution arrests occur in less than one-third of 1 percent of the city's blocks." I have several problems with this sentence. What is the concentration of other criminal activities? Arrests are not the same as prevalence. And, a few pages later (p. 41), they will make the startling claim that "a Chicago street prostitute is more likely to have sex with a cop than to be arrested by one."
p.33 -- A table of sex acts and their average prices. It's important to establish the sample sizes underlying the average prices. The researcher documented 2,200 sex acts, and the least frequent act accounted for 9% of those, so about 200 acts. To establish the margin of error around those averages, I'd also need the spread of the individual prices.
p.40 -- They compare a real estate agent to a pimp. Some data is used to justify the claim that the Internet has reduced the power of real estate agents while the internet "isn't very good -- not yet, at least -- at matching sellers to buyers". Therefore, the impact of a pimp is larger than that of a real estate agent. Would like to see a study of Internet substituting pimps. As it stands, this is an assertion without proof.
p.46 -- Some of the language is overdone. They say the men "blew away" the women in a version of an SAT-style math test with twenty questions. What does "blowing away" mean? Scoring 2 more correct questions out of 20.
pp.47-8 -- Tackle a study on the wage change of men or women who underwent sex change operations. As they point out, this study really doesn't answer the question of what might happen if men are randomly made into women, or vice versa. The problem is this is not a random selection. The study found men who became women lost a third of their previous wages. This would imply they did not keep their prior jobs. But does this job change show women gravitate to poorer-paying jobs, or that higher-paying jobs select men? The direction of causation crops up again, and we are no closer to the answer.
The rest of the chapter -- They discuss Allie, a high-end prostitute. This section has little interest for a statistician since it is a sample of one.
Please do let me know if this sort of review is useful or not.
PS. Andrew has some thoughts here.