A revised version of my previous post is picked up by Harvard Business Review (link). This post introduces the OCCAM framework for Big Data that I have been speaking about at my book talks (upcoming events are listed on the right column of the blog). The OCCAM framework identifies five elements of today's data sets that present challenges compared to traditional data sets. These challenges are not new but they are amplified by the nature of Big Data. Little attention has been paid to these challenges by the Big Data industry but as we hear about Big Data fails, we will surely hear about these elements.
Here is the link to the registration page.
I'm excited that Nate Silver has finally relaunched fivethirtyeight.com. (He announced his move from NYTimes to ESPN last year.) The site has a clean look and is easy to navigate. He has pieces on the NCAA bracket, an early discussion of the Senate race, the time-wasting statisticians' debate over data is/are, and this nice piece about reading economic data, and some others.
Apparently a bunch of economists has already pronounced its death (Krugman's "Tarnished Silver" is an example of these pieces.) I find these reactions premature and immature. The sample size of articles right now is very small - and they need time to find their audience.
The challenge for Nate is to figure out a balance between long articles based on thorough statistical analyses, and shorter pieces that get to the point quickly but still fundamentally driven by quantitative thinking. For example, I thought the piece about economic data useful for the average reader: the median tenure on the job going up does not necessarily mean workers are keeping their employment but can be a result of workers with less experience being selectively laid off is not something an average reader thinks about on an average day. Tyler Cowen (Marginal Revolution) snarked: "What it says is fine, but it won't interest me." But how many of Nate's readers are professors of economics?
Writing data-driven stories is a challenge that I am familiar with. In fact, some critics also think I have failed in this regard. The editor of Significance, in a generally favorable review, praises the front part of Numbersense, which is extremely restrained in the required mathematical background but considers the last two chapters of the book a "sudden death ending". Those parts on the other hand have been received favorably by data analysts. It's hard to satisfy both constituents.
I sense that Nate and I share a common mission, which is to show the general reader how to interpret data and data analyses. This means that what we show must be replicable by the reader. By this criterion, the analytical work would not be publishable in an academic journal. It might be a partial analysis, for example, based on a sample that is feasible for readers to collect, or a simplication of a model that is too complex to describe in 1000 words. We also assume that the general reader is not inclined to plod through academic journals looking for the supreme proof of something.
The big change to 538 is that it is now a team production. It was easy to keep a consistency of quality and a niche when Nate's the only contributor. Other writers have their own styles and inclinations. It remains to be seen whether this effort will develop an identity.
I fail to understand the complaints by the economists that the 538 pieces on economics are not informed by expert opinion at the same time that these economists profess to be fans of Nate's work in election forecasting, which is a field that also has a history of academic work. It sounds like "not in my backyard".