The current issue of Significance includes an article by me on the "pending marriage between statistics and Big Data". If you are a member of either the American Statistical Association or the Royal Statistical Society, you should be able to access the article via this link. If you don't belong to one of those, and you have a smartphone or tablet, you can download the Significance app -- because 2013 is the International Year of Statistics, this app is free for all right now.
If you don't belong to ASA or RSS, and do not have a smartphone or tablet, then I am able to print an excerpt of my article below. The excerpt looks at online experimentation (aka A/B Testing), pointing out where current practice runs into trouble, and how statisticians can play a role. In the article, I identify several other areas in which statisticians have the potential to help move the Big Data field forward. It won't be easy because fundamentally, the way computer scientists approach data is at odds with the way statisticans approach data.
David Walker presents the other side of the debate in the same issue.
***
More effective experiments
In 2012, Wired magazine eulogised the “A/B test”, declaring it to be “the technology that’s changing the rules of business”. The A/B test is known to every introduction to statistics student as the t-test of two means. Yes, the t-test is traced back to Gosset who developed it for the Guinness brewery in the 1900s. In the contemporary setting, a website delivers at random one of two pages to visitors, and measures if one page performs better than the other page, typically in terms of clickthroughs.
Brian Christian, the author of the Wired article, asked: “Could the scientific rigor of Google’s A/B ethos start making waves outside the web? Is it possible to A/B the offline world?” Any statistician will answer that many industries long ago implemented randomised, controlled experiments, and did so before the web existed, and at a higher level of sophistication than at most web companies. For example, direct marketers routinely run statistical tests to optimise their marketing vehicles such as catalogues and direct mail.
One of Christian's talking points holds: the web is indeed a nice laboratory in which tests can be executed at scale, and relatively painlessly (though see the section on randomisation below) [Ed: not excerpted]. And yet, in the A/B testing universe, few people are aware of the huge literature on statistical testing, or of [Sir Ronald] Fisher’s monumental contributions. This field is ripe for collaboration between computer scientists and statisticians. A quick flip through the Wired article reveals numerous fallacies about t-tests: fallacies of certainty, of automation, and of false positives among them.
The fallacy of certainty. Again and again, Christian stresses the certainty of test results, using words such as “incontrovertible”. Data from tests end all subjective arguments, we are told. How is it possible to have such definitive results when, as these web businesses claim, they run thousands of tests per year? One expects that most tweaks, such as changing the width of a border on a web page, have inconclusive results. It turns out that most practitioners of A/B tests use point estimates. If the test fails to achieve significance, the variation with the best performance is declared the “directional” winner. Sometimes, a test is run for such a length of time that tiny effects display significance by virtue of sample size.
The fallacy of automation. In Christian’s world, the summit of A/B testing is “automating the whole process of adjudicating the test, so that the software, when it finds statistical significance, simply diverts all traffic to the better-performing option – no human oversight necessary”. Twinned with this is the fallacy of real time. One of the deepest insights in statistics is the law of large numbers, which requires a sufficient sample size in order to detect a signal to a given precision. Real-time decisions imply undersized samples, and huge error bars. Furthermore, such decisions are biased, as Microsoft scientists explained in an important paper [PDF] on the “novelty effect” and the “primacy effect”, among other things. False positive results abound in small samples, turning statistical testing into witchcraft.
The plague of multiple comparisons. In the new world of “choose everything”, that is to say, “see what sticks”, Wired reports that “the percentage of users getting some kind of tweak may well approach 100 percent. Statisticians worry about false positive findings when so many tests are run at the same time. Given the complexity of correcting for multiple comparisons, it is not surprising that the software tools available to conduct A/B tests completely ignore this issue.
We should be excited that randomised, controlled tests have been embraced by the web community. Regrettably, only a few practitioners, such as Ron Kohavi’s team at Microsoft, and Randall Lewis and Justin Rao (in their work [PDF] at Yahoo!), have reflected on the practical challenges of this enterprise. Statisticians are well equipped to make important contributions to how experiments are designed, executed and analysed.
***
Cathy O'Neill at MathBabe has some related thoughts. In this post, she makes a distinction between business analytics (statistics) and Big Data. In this post, it's not clear that she thinks Big Data as defined in the other post makes sense.
I've only skimmed the two articles so far, but I find it amazing that this is posed as some sort of debate. I suppose it is, but if statisticians abandon big data to the computer scientists the field is headed toward academic irrelevance.
Statistics is an applied field after all. An applied field that shrinks from the difficult questions involved in being applied stops being relevant to that area of application.
The analogy I'm using lately is that a statistic (like Stan Musial's lifetime batting average) is an object, like a bridge. A user of that statistic needs to know a few things in order to use it (like, don't drive off the side of the bridge). To design a bridge, you need a civil engineer (another type of applied mathematician) who can make the decisions about proper construction.
Even in the case of even a simple statistic like a batting average, there are questions of proper construction: do we include walks? what if you reach on an error? should doubles count more than singles? Proper construction of measurement is what applied statistics is all about -- not surprisingly, because a statistic is a measure.
Bridge technology has changed substantially over the past 300 years: there are new materials to make the bridges (no more iron trusses). There are new types of loads to be carried (the introduction of trains, the demands to carry heavy trucks, etc.).
While any individual engineer can (and should) decide that they are only going to deal with certain types of bridges, the field as a whole has had to broaden to deal with these challenges.
Similarly, individual statisticians can deal with big data or not, but the field as a whole shrinks its domain of relevance if the challenge of dealing with the new materials of big data are ignored. We will leave it to the computer scientists, the operations research practitioners, and the engineers.
Posted by: zbicyclist | 09/03/2013 at 10:27 AM
I've now read David Walker's piece (the other side of the argument) twice. His objection to statisticians becoming involved in big data is because they will be used to help companies: "Big data is principally about taking more money off customers by (let us put it perjoratively) more effective snooping on their habits."
That's a narrow definition of what marketing is about, but certainly marketing is at least partly about that.
Having spent my entire professional career in statistical applications in marketing research, I can tell Walker that this ship sailed long ago. Marketing research is by no means the valedictorian of the statistical school, but I'm not quite sure why Walker seems to be contending that we should be expelled, or at least wants to pretend we don't exist.
Walker does have some good points. I'm fully on board with his McKinsey bashing, and he does point out the vital point that "big data" and "open data" aren't the same thing at all.
Posted by: zbicyclist | 09/03/2013 at 11:48 AM
I have not read the original article, so my comment is based on the excerpt.
Re automation: Reacting to every "significant" result as if it were a valid signal will lead to "tampering", as Deming would have put it. Given the number of tests which are or will be run, there will be many many instances when a low probability event will be judged "significant" and action will be taken. Often, the proper action to take is to take no action.
Posted by: Mathman54 | 09/04/2013 at 02:41 PM
Can you please explain the calculation that produced "10% progress that won the million-dollar prize is roughly worth one-tenth of one star on the five-star rating scale"?
Posted by: Dimitriy | 09/06/2013 at 08:26 PM
Say some more about why the marriage is pending.
Is that change of definitions, as some of this is pretty old.
I was at Bell Labs in the 1970s/early 1980s, and Murray Hill Bldg 5 had statisticians who spent their time analyzing telephone records whose volumes certainly fit Big Data for the time, even if it wasn't called that. The Bell System tracked every trouble report, down to thing like squirrel bites and gunshots, and later we did expert systems for rummaging the data and looking for patterns.
In the 1990s, Silicon Graphics was selling supercomputers to telcos and others, both for marketing analytics and fraud detection, both of which needed much statistical analysis.
For some history, see:
http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/?_r=0
http://www.slideshare.net/amhey/big-data-yesterday-today-and-tomorrow-by-john-mashey-techviser?utm_source=slideshow03&utm_medium=ssemail&utm_campaign=share_slideshow
See especially slides 22-24.
Posted by: JohnMashey | 09/20/2013 at 03:44 PM