Last week, Andrew Gelman (link) and I were kindred spirits: we both did a "numbersensing" exercise on two different data analyses. I was reading the MailChimp study on the effect of Google siphoning off "marketing" emails into a separate tab, and a noise buzzed my head when I saw that the aggregate click-to-open ratio was reported at an inconceivable 85%. (See Part 1 of my reaction here.)
In the meantime, Andrew was investigating a tidbit that appeared in a chapter of the "Doing Data Science" book by Rachel Schutt and Cathy O'Neal, in which it was claimed that the slowness in matching certain data records led to "one or two patients per week" dying, which Gelman estimated to imply up to one-quarter of the deaths were caused by poor record-keeping (!)
What Gelman called his "Spidey-sense" is what I've been calling "numbersense". In the age of Big Data, when everyone has data to make arguments, developing numbersense is really important to disentangle ambiguous or conflicting studies. Hey, I wrote an entire book about this.
***
In the same post, Gelman made a brilliant observation. We have come across this type of data stories before. Just open a Gladwell book, or some of the Freakonomics stuff, and you'll find plenty of other examples. Gelman is calling these "parables", thus the rise of the "statistical parable".
The point is that the people who write these stories do not really care if the numbers are accurate or not. Put differently, they are vested in the direction of a relationship but not its magnitude. In the above story, the writer is interested in the fact that poor record-keeping can lead to some unnecessary deaths but how many is "some" is of no concern. The data is really a side show; the message is the main attraction!
This goes a long way, I think, in explaining the popularity of the genre as well as the repulsion of many statisticians to this type of stories.
The reason why statisticians dislike statistical parables is that these stories are false unless we can verify two conditions: one is a strong enough signal; two is not too much noise.
The problem is that humans love a story. This is pretty fundamental. There is no way around this, so the only possible solution is to try to tell better, more accurate stories, rather than decry stories.
Posted by: Franklin Chen | 01/27/2014 at 10:25 AM
Franklin:
Well, I don't know if this helps, but my post is called "Parables vs. stories" and in it I write, "I’m a statistician, and I like stories more than parables. I like that when I look into a story or a statistical example carefully, I can keep learning. I like the fractality of stories, the way that the deeper we look, the more we can learn." So, yes, I think stories are great and it's important to stress what they have to offer. Hence also this paper with Basboll, "When do stories work? Evidence and illustration in the social sciences."
Posted by: Andrew Gelman | 01/27/2014 at 01:04 PM