The American Association for Public Opinion Research (AAPOR) has put out its Big Data report last month (link). This one is worth reading. It has some of the most current citations, and readers of this blog will be very receptive to its core messages. The team who wrote the report is a mix of academics and practitioners.
In Big Data, there are many self-evident truths, according to the people who talk about Big Data. One of these is the idea that Big Data will make surveys obsolete. How could it not since Big Data means you have hundreds if not thousands of times more "respondents", the ability to track trends in "real time", and the ability to evolve your survey questions? This AAPOR report, I suppose, is a response to such claims.
Then there are those who say surveys are merely a suboptimal stopgap in the old days of "small data". They explain surveys measure "stated preferences" while Big Data (i.e. found data, observational data) measure "revealed preferences", and aiming for the latter is self-evidently better. Revealed preferences are closer to "the truth". Surveys merely represent an approximation, and with Big Data, we no longer need to run surveys.
This topic nicely ties in with Chapter 1 of Numbers Rule Your World (link). In that chapter, I explain the success of Disney in keeping customers happy despite having to wait two hours for rides that last two minutes. The "imagineers" realized that managing perception is even more important than optimizing reality. Customer happiness improves even though measured waiting times (i.e. revealed information) worsen or stay the same.
If Disney relies solely on revealed metrics, the data would say customers are waiting longer, or just as long, as before. When Disney conducts surveys and ask people how they feel, they say they are waiting less and so are happier. Feelings are crucial data that are not revealed by any observations. To the extent that feelings are "revealed", it requires an assumption on the part of the observer; since the measured waiting time is reduced, the customers must be feeling happier.
On a website, the web log has measured data that reveal the paths of users through the website. One can observe where traffic drops off. But that is not enough. In order to reduce attrition, the designer needs to understand why users exit. Surveys provide the answer here.
What if you learn that users exit by clicking on the home page icon? So you test a version of the design in which the home page icon is not clickable. You observe that the exit rate has significantly fallen off in the new version. Problem is the users now find a different way to exit. Relying only on revealed preferences frequently leads to superficial actions that cure symptoms but not root causes.
Revealed preferences and stated preferences are two different dimensions and they both have strengths and weaknesses. Logs are bigger and faster but researchers have no control over the composition of the responders. Neither is a substitute for the other. I am interested in seeing work on integrating the two approaches. The AAPOR report has a good discussion of this subject plus a few references to new work on integration.