Clive Thompson tells Wired readers that we should all speak the language of data. (The online version is here. The article also appears in the May issue.) He argues that statistical illiteracy is the nation's political problem. "If you don't understand statistics, you don't know what's going on -- and you can't tell when you're being lied to."
In other words, everyone should think like a statistician... my pitch to the Baltimore Sun reporter is here.
***
As happens with these interviews, a great deal was said, much of it vanished during the editing process.
The example I provided illustrates "survivorship bias". The most important metric for the retail sector is same-store sales growth, the average change in sales experienced by individual stores. Less known is the fact that only stores opened for at least one year are considered eligible for the sample: this is sensible since new stores may experience "growing pains" and thus unfairly drag the metric down.
Even less known is the fact that stores closed during the reporting period are excluded: this is sensible in a stable economy; removing transient activity can be justified if one wants to measure the average trend. In a recessionary environment, such an exclusion creates a bias in the sample, which has the effect of magically creating sales growth when sales is merely migrated from one store to another.
Imagine there are four Starbucks in your neighborhood (quite likely in NYC). Starbucks decided to shut one of the (low-performing) stores down. Because this store was not reported as open at the end of the month, it was deemed ineligible for the sample. Meanwhile, all the customers of this store took their business to the other three Starbucks in the neighborhood. Those three stores saw a jump in sales. The jump in sales constituted same-store sales growth, when in fact they merely migrated from the closed store.
It's a form of "survivorship bias" because the sample used to estimate same-store sales selectively included only "survivors". If the closed store were also added to the sample, then the drop in sales (to zero) at this store would cancel out the jump in sales in the other three stores, which would reflect the economic reality.
So, yes statistics is tricky but it can be learned.
Thanks, Clive, for devoting a column to make this important point.


Great explanation of survivorship bias; I'm quoting you in my intro stats class next semester!
Your post is much more informative than the Wired article, Thompson condensed his material just a bit too much. Better to tell one or two stories effectively than several poorly.
Posted by: Mike Anderson | 04/28/2010 at 06:39 AM
Mike: Thanks. I think the book is a great resource for an intro stats class, gives a sense of what statistical practice is like, something I wish I was exposed to in college.
I thought Thompson did a good job - enough to intrigue readers and not too much to scare them away. It's hard to strike a balance.
Posted by: Kaiser | 04/29/2010 at 01:47 AM