John Allen Paulos has a new piece at NYTimes.com called "Stories vs. Statistics".
It's thought-provoking, somewhat discursive, and I don't agree with everything there.
He argues that we subconsciously use statistical concepts in our everyday speech:
With regard to informal statistics we’re a bit like Moliere’s character who was shocked to find that he’d been speaking prose his whole life.
I don't buy this. For instance, Paulos thinks:
the idea of sampling is implicit in words like “instance,” “case,” “example,” “cross-section,” “specimen” and “swatch”
But sampling is a complicated concept. I think we can use words like example, instance, case and so on without ever understanding anything about the science of sampling, like how to construct samples, random v. nonrandom samples, how to generalize information contained in samples, bootstrapping, etc. Just ponder this question for a moment: why is random sampling a good idea if the act of introducing randomness into the proceeding introduces errors? That's a question my students get to think about in class.
I recommend Steven Stigler's books on the history of statistics, in which he convincingly argues how the most important statistical concepts took a long time to develop, after many false starts. Warning: some of Stigler is hard to read without a mathematical background.
I love the part where Paulos points out the inevitable tradeoff between false positive and false negative errors. That's one of the key points in Chapter 4 of Numbers Rule Your World. Beware of those people who sell you technology that minimizes just one of the two types of errors.
Then, Paulos walks through several important examples of non-intuitive statistical thinking, first studied by Kahneman and Tversky. This part is fun, and makes you realize yet again how brilliant and important were their contributions.
This is a great sentence:
In listening to stories we tend to suspend disbelief in order to be entertained, whereas in evaluating statistics we generally have an opposite inclination to suspend belief in order not to be beguiled.
He equates statisticians to people who can't stand Type I errors (false positives), seeing things that are not there. He equates storytellers to people who can't stand Type II errors (false negatives), not seeing things that are there. He thinks there are two types of people.
Another great sentence:
The focus of stories is on individual people rather than averages, on motives rather than movements, on point of view rather than the view from nowhere, context rather than raw data.
I bolded the part that resonates with me. Much of the abuse of statistics stem from the fallacy that everyone is like the average. Statistics can predict/explain the average very well but it cannot predict/explain individuals. I'll leave it at that for now, and maybe I'll come back to this in the future.