This is the second part of my review of Blink by Malcolm Gladwell. The first part is here.
Blink has an awkward relationship with statistics. In a book on prediction, statistics should be a protagonist but it isn't. In certain parts where one expects to find statistical thinking, it's not there. Yet, one can't say statistics doesn't exist within Blink; it does, but it hides in the nooks and corners. Statistics in Blink is like a lone diner in a busy restaurant; he's present but not seen.
Statistical thinking shows up beautifully in a few spots.
In describing the work of psychologist John Gottman, who made some claims about predicting divorces, (the section starting on page 31) Gladwell masterfully describes the idea of "data reduction" which is central to statistics: much of the data, in this case, recorded conversations between couples, contain "noise"; such noise interferes with the project of finding predictors of divorce; the key to good analysis is the ability to fish out the "signal", leaving behind the noise. This is exactly what a statistical model does. (This is, however, not an endorsement of Gottman's analysis. As Andrew Gelman pointed out in a comment to my first post, statisticians have issues with Gottman's methodology, which I will get to in a moment.)
Then, in Chapter 5, Gladwell narrates numerous case studies of businesses using analytics, including music companies developing song ratings, Pepsi and Coca Cola running various taste tests, focus groups that gave margarine its yellow color, panels ranking jams for Consumer Reports, Herman Miller designers gathering consumer feedback on new chairs, and so on. While many of these efforts, all of which involve deliberate, data-driven analyses, are branded as failures by Gladwell, readers should keep an open mind. For me, this collection is a feast of real-world applications of statistics.
One gets to certain sections of the book expecting a statistician or two to pop up, but no such luck. So in what follows, I will add some filler materials relating to statistics. Because Gladwell describes the research in words, without enumeration, if one seeks answers to my queries, one must review his sources, which I haven't done.
Let's start with Gelman's comment: "Gladwell in Blink is notorious for taking at face value the exaggerated post-hoc predictions of divorce guru John Gottman." The concern here is whether the wisdom lauded by Gladwell is foresight, or hindsight. This is a question that pops up regularly in my head as I read Blink.
I'll illustrate the hindsight problem using the malpractice lawsuit example (pp.40-43). The researchers found two groups of doctors, those who were sued at least twice in the past, and those who weren't. They then recorded conversations between these doctors and their patients. These conversations were analyzed along a lot of dimensions, and some factors emerged as statistically different between the sued-doctors and the not-sued-doctors. Chief among these differentiators was how much time the doctors spent with their patients.
This sounds like sensible stuff... until you realize that those recorded conversations took place after the doctors got sued. This methodology uses current measurements to explain past happening but what took place during the conversations could not possibly have caused the doctors to get sued in the past. The conclusion requires that we believe these doctors also, say, spent less time with their patients in the past but that is a guess only, not a direct measurement.
One can imagine these opposing scenarios: perhaps it's a matter of luck who gets sued, and then, regression to the mean implies that the not-sued group of doctors should on average get sued more often in the future; or perhaps doctors never learn, and those who make mistakes are doomed to repeat them. Or perhaps, doctors prefer not to get sued again, and adapt their behavior. The interpretation endorsed by Gladwell picked the second of these scenarios as true but why?
One test of their methodology would be to go out and find two groups of doctors, those who spend more time with their patients and those who spend less time with their patients. These two groups must be "matched" so that they are comparable. In particular, make sure that both groups contain the same proportion of doctors who were sued in the past. Now, wait a few years and compare the rate at which each group got served with malpractice lawsuits.
Apart from the hindsight problem, this analysis also suffers from myopia. The researcher decided from day one, without justification as far as I can tell, that the only useful predictors of malpractice lawsuits arise from doctor-patient conversations. Apparently, no other factors, like experience, age, height, dexterity, education and so on, matter.
I can't say that the research is definitely wrong but I'll not take it at face value. Unfortunately, quite a few examples in Blink follow a similar pattern.
Another incarnation of the hindsight problem is best shown by the Tom Hanks example (p.45). Hollywood super-producer Brian Glazer tells us that he knew Hanks would be a megastar in "that first instant" when Hanks went to him and read a script. But in statistics, you have to invert the situation and ask this question: how many other actors have Glazer loved at first sight, and what proportion of those have become stars? It is all too easy to pick a superstar and trace back to that first instant; the challenge is to find all those first instants, and track them to their natural conclusions (star or bust).
We can classify Gladwell's examples into two groups. The first group, such as predicting superstars or malpractice lawsuits, is simpler to comprehend because the predicted outcome is countable. The second group of cases involve quantities being predicted that are ethereal things that do not have real values.
For example, psychologist Samuel Gosling (p. 36) showed that "strangers with clipboards came out on top... at measuring conscientiousness, ... emotional stability and their openness to new experiences" of other people, when compared to these people's friends. Traits like emotional stability do not have objective values. So, in fact, Gosling is comparing the friends' perception of the subject to the subject's self-perception (or, the impression the subject desires to leave upon others.)
This isn't a criticism of this type of study, just something to bear in mind when reading about them.
At various moments, I scratch my head over the practical value of the predictions. For instance, the tennis coach Vic Braden claims to have a supernatural ability to predict when a player will double-fault on the next service. So what?
There is a long section on face-reading (Chapter 6) in which Gladwell follows some researchers who have a system to describe in words people's facial expressions. They narrated Kato Kaelin's testimony during the OJ Simpson trial. "It's almost totally A.U. nine. It's disgust, with anger there as well, and the clue to that is that when your eyebrows go down, typically your eyes are not as open as they are here. The raised upper eyelid is a component of anger, not disust. It's very quick. You know, he looks like a snarling dog." (p.211). I must confess I don't get the point of this story because none of these "predictions" could be checked.
The rest are some sentences that don't make much sense to me:
p.40 "Analyses of malpractice lawsuits show that there are highly skiilled doctors who get sued a lot and doctors who make lots of mistakes and never get sued.": Presumably, doctors who never get sued are not involved in the malpractice lawsuits being analyzed. This relates to a difficult issue in evaluating statistical predictions: if an action is taken based on a prediction, then it is often tricky to measure the counterfactual (what would have happened if the action were not taken).
p.41 "Roughly half of the doctors had never been sued. The other half had been sued at least twice.": So, none of the doctors were sued exactly once? This kind of strange experimental design should be explained. One would also like to know if the two groups of doctors are comparable in any way: say, if young doctors are more likely to get sued, then the sued-group of doctors will likely have a youth bias relative to the other group.
p.56 "... a study in which they had two groups of students answer forty-two fairly demanding questions from the board game Trivial Pursuit.... the difference between 55.6 and 42.6 percent.. is enormous. That can be the difference between passing and failing." It makes sense to explain why the 13 percent difference between the two groups of students is a large difference but it has nothing to do with the passing threshold (of 50 percent presumably). If one group scores 49.5 percent and the other, 50.5 percent, we have a tiny difference but still "a difference between passing and failing"! Typically, statisticians will use standard errors to put this difference into context.
p.93 "all other things being absolutely equal, how does skin color or gender affect the price that a salesman in a car dealership [sic] offers?" Alarm bells ring loudly whenever we read the qualifier of "all things being equal", here underlined as "absolutely equal" which adds nothing, because in social science, you cannot possibly change only one factor and fix everything else. You can't find a white guy who is the same in all aspects as a black guy except for his skin color.
pp.92-3 I felt I misread something in Gladwell's recounting of Ian Ayres's experiment. Ayres had 38 students participating, including five black men and seven white women. They "were instructed to go to a total of 242 car dealerships [sic]". At each dealer they bargained. And the study had conclusions such as "Even after forty minutes of bargaining, the black men could get the price, on average, down to only $1,551 above invoice." Since he had an average for black men only, then it would seem that each student had to visit 242 dealers because you can't send the white guys to some of the dealers and the black guys to others. But it would be a rather lengthy study if each student had to spend up to forty minutes at each of 242 dealers. I also noted that the average price for black men was an average of five samples.