Last week, Andrew Gelman (link) and I were kindred spirits: we both did a "numbersensing" exercise on two different data analyses. I was reading the MailChimp study on the effect of Google siphoning off "marketing" emails into a separate tab, and a noise buzzed my head when I saw that the aggregate click-to-open ratio was reported at an inconceivable 85%. (See Part 1 of my reaction here.)
In the meantime, Andrew was investigating a tidbit that appeared in a chapter of the "Doing Data Science" book by Rachel Schutt and Cathy O'Neal, in which it was claimed that the slowness in matching certain data records led to "one or two patients per week" dying, which Gelman estimated to imply up to one-quarter of the deaths were caused by poor record-keeping (!)
What Gelman called his "Spidey-sense" is what I've been calling "numbersense". In the age of Big Data, when everyone has data to make arguments, developing numbersense is really important to disentangle ambiguous or conflicting studies. Hey, I wrote an entire book about this.
In the same post, Gelman made a brilliant observation. We have come across this type of data stories before. Just open a Gladwell book, or some of the Freakonomics stuff, and you'll find plenty of other examples. Gelman is calling these "parables", thus the rise of the "statistical parable".
The point is that the people who write these stories do not really care if the numbers are accurate or not. Put differently, they are vested in the direction of a relationship but not its magnitude. In the above story, the writer is interested in the fact that poor record-keeping can lead to some unnecessary deaths but how many is "some" is of no concern. The data is really a side show; the message is the main attraction!
This goes a long way, I think, in explaining the popularity of the genre as well as the repulsion of many statisticians to this type of stories.
The reason why statisticians dislike statistical parables is that these stories are false unless we can verify two conditions: one is a strong enough signal; two is not too much noise.