I'm continuing my close reading series with Charles Seife's Proofiness: the dark arts of mathematical deception. The previous series on the first 3 chapters of SuperFreakonomics can be found here, here, here and here. My overall review of Seife's book was posted earlier.
You should think of these pieces as reading guides. I record my thoughts as I make my way through the chapters. If I didn't find them worth reading, I would not have spent the time to read them with a magnifying glass. I am a stickler for precise language even for popular science books -- I understand the need to entertain but just like I've been saying at Junk Charts for years, entertainment should not get in the way of clarity. That is a tall order, and sometimes, despite valiant efforts, entertaining prose is not as clear as they should be. That's where I come in.
As discussed in my review, the first three chapters of Seife's book is an update to the "damned lies and statistics" genre made famous by Huff's How to Lie with Statistics. Seife is a professor of journalism, and I can only hope journalists would read his book and learn from it. Statisticians will be mostly happy with what Seife says here.
Book jacket and p.4: Seife's definition of "proofiness" is "the art of using bogus mathematical arguments to prove something that you know in your heart is true -- even when it's not." Something about this sentence confuses me a lot. Who is the "you"? Is he referring to the politicians, journalists, scientists, etc. whom he would later skewer for committing proofiness, or is he referring to readers who are interpreting numbers fed to us? Or, is he saying proofiness is a condition that affects us all, including the sleazy politicians and you and I? Is he talking about self-deception or deceiving others? Is the math "bogus" because it's incorrect or it's fake? I'd like to have a clearer sense of the headline concept of the book but at this stage, I keep my curiosity in check.
p. 7: Seife clears up the air with the first sentence of Chapter 1: "If you want to get people to believe something really, really stupid, just stick a number on it." Now I'm not sure if this captures everything he's planning to write about, but for me, this sentence works as a definition of proofiness.
He doesn't make a distinction between journalists who don't fact-check the numbers they cite, scientists who publish false-positive results, and politicians who invent numbers to push their agendas.
p. 9: "Two plus two is always four. It was always so, long before our species walked the earth, and it will be so long after the end of civilization.": This is just wrong. Arithmetic is a human invention. Besides, 2+2=4 only works in base 10 so it's not even always true today. (Read about systems of numbers here.)
p.10: Seife raises a very important point: that all measurements are inaccurate, have margins of error. This leads to the following observation:
To mathematicians, numbers represent indisputable truths; to the rest of us, they come from inherently impure, imperfect measurements.
I'm thinking, which side should statisticians put themselves? We identify as mathematicians but we also regard numbers as uncertain measurements.
p.17: He skewers marketers for making up crowd/listenership/circulation figures. Useful to think about the incentives behind this sort of bogus statistics. People who buy advertising are looking for "reach"; how many people will be exposed to the commercial is pretty much the only metric they have to put a value on the advertising campaign. Both the people who spend the money and the people who take the money have an incentive to claim the campaign has been a success! Also, think about the empty restaurant syndrome: many of us would turn away from a restaurant if we find it empty; there is a sort of self-fulfilling prophecy that crowds engender crowds. I'm not arguing for the morality of this; I hope he will address these behavioral issues eventually in the book.
p.17: In a section talking about estimates of the size of "tea party" crowds, Seife cites ABC, Sean Hannity, and Washington Post. The Post's number is his pick. This anecdote raises the all-important question of who should be the judge of whether a statistic is bogus or not? (In a different context, which poll is credible?) On the previous page, Seife reports that the park service has taken themselves out of the business of issuing "official" crowd sizes at public events. Not surprising at all because such a job is completely thankless: whatever number you provide, you will make enemies!
p.21: Starting to feel like I'm in a remedial class; he's still gnawing at the same point about numbers being approximations... but I soldier on.
p.21-2: Seife makes an excellent point about precise numbers giving a false sense of certainty. However, I don't like the two examples he uses to explain this. The first example is someone who reports that his car costs $15,000 versus someone who says $15,323; the second example is estimating one's own age as 18 years as opposed to 18 years, 2 months and 3 days. Seife tells us we should believe the people who give us rounded-up numbers more than those who give us precise numbers. This is a fine point, but unfortunately, in both of these examples, the people who give precise numbers are plausible; they are talking about their own cars, and their own ages. If, on the other hand, they are asked to guess the cost of their neighbor's car, or the age of the neighbor's dog, then Seife's point is well taken.
p.23: His next set of examples I also find wanting. He complains about Kofi Annan and the UN making a big fuss about pinpointing the Earth's 6-billionth inhabitant, and the Chicago Sun-Times declaring the 300-millionth U.S. resident (obviously impossible to know, just as it is impossible to know who is the first new-year's baby). The problem: both are inconsequential PR stunts. There are plenty of examples in which ignoring error bars lead to bad decisions that have consequences.
p.26: Seife likes to name things; proofiness, Potemkin numbers, disestimation, now "fruit-packing". Fruit-packing is one of: cherry-picking, comparing apples and oranges, and apple-polishing. One can hardly complain about reiterating them since these mistakes keep cropping up.
p.33: He gives examples of deception practiced by both the Bush people and the Gore people. Various techniques of deception are involved; delving deeper, I think the common spirit of all such tactics is misdirection: creating a general sense of suspicion by channeling attention to a trivial aspect of the data that has no significance to the conclusion. Because random chance is always part of statistics, any analysis can be trivialized in this manner. Don't fall for it.
p. 39-54: He's on the evergreen correlation is not causation theme. Seife looks at the correlation (better described as a coincidence) between the advent of Nutrasweet and a rise in brain tumors, and points out that the rise in tumors also coincides with a rise in deficit spending. Good illustration of why arguments based on "changepoints" (timing of a change) are shaky at best.
Readers must be careful with this discussion! Realize that the existence of one spurious correlation does not prove that all correlations are spurious. In other words, while we may agree with the implausibility that deficit spending could have caused brain tumors, that fact is not valid evidence against the Nutrasweet-causes-brain-tumor theory.
My readers will note that I disapprove of how most statistics textbooks discuss the correlation/causation issue: you are told what not to do, but left with no idea of what to do.
p.55: Seife describes Nature as the most prestigious science journal in the world but he proceeds in the rest of the chapter with a succession of examples of results, all published in Nature, that were later overturned on statistical grounds, which makes me wonder if its prestige is undeserved.
p.59: He turns to why extrapolation beyond the study population is not recommended. On safe ground here.
pp.79-80: Some comments about Enron and Madoff, no doubt to satiate the publishers. These are complex situations not easily covered in a few pages. Did people really have no clue about the risk of Enron? Or were they willing participants in a Ponzi scheme? Do individuals make their own investment decisions these days when most stocks are primarily owned by institutions (mutual funds, e.g.)? Do investment managers have an incentive to manage clients' risk exposure for the clients' benefit or for their own benefit?
p.81: I like how he explains the mortgage mess, but he strangely ignores the credit rating agencies.
p.86: The tragedy of the commons always makes good reading. I recently came across a variant of this phenomenon: a friend decided not to sell his house which is "under water" since it was bought near the peak of the boom--he opted to hurt his own economic interest in order to benefit the commons, to avoid the wrath of his neighbors because anyone who sells creates a "market price" for the entire neighborhood--if no one sells, there is no "mark to market". Here, the commons is the fictional book value of the homes, and the tragedy would be for individual homeowners to exit, which has the effect of reducing the value of the commons. My friend's behavior indicates that we cannot assume everyone would act on his own self-interest only.
The game theoretical framework is perfect for such problems. I believe that these problems cannot be analyzed without introducing a "morality" dimension, and recognizing that there are individual variability in "morality". It is almost sure, I believe, that someone in my friend's neighborhood would eventually sell, forcing all others to mark to market. The first one to sell is likely to fetch the best price. The uncomfortable truth may be that being moral is idiotic, that good guys finish last.
pp.87-90: He considers moral hazard as a form of "risk mismanagement". I agree it's a crucial topic that is being swept under the carpet by the political class but it's an entirely different thing from incorrectly computing the odds of something, mentioned earlier in the chapter. Seife's decision to not distinguish between deliberate deception and incompetent mistakes hurts his point here: the bankers who created the mortgage mess did not mismanage risk; on the contrary, they understood how the risk could be moved around and ultimately socialized.
In these pages, Seife also parrots the government's version of the bailout story. This creates the awkward juxtaposition of the supposedly unavoidable socializing of private losses, and the clear warning about moral hazard. What is left unexamined is why only two extreme solutions were considered: socialize all losses, and socialize zero losses.