A revised version of my previous post is picked up by Harvard Business Review (link). This post introduces the OCCAM framework for Big Data that I have been speaking about at my book talks (upcoming events are listed on the right column of the blog). The OCCAM framework identifies five elements of today's data sets that present challenges compared to traditional data sets. These challenges are not new but they are amplified by the nature of Big Data. Little attention has been paid to these challenges by the Big Data industry but as we hear about Big Data fails, we will surely hear about these elements.
***
On Wednesday, I'm participating in a free Webinar organized by Agilone on data-driven marketing.
Here is the link to the registration page.
***
I'm excited that Nate Silver has finally relaunched fivethirtyeight.com. (He announced his move from NYTimes to ESPN last year.) The site has a clean look and is easy to navigate. He has pieces on the NCAA bracket, an early discussion of the Senate race, the time-wasting statisticians' debate over data is/are, and this nice piece about reading economic data, and some others.
Apparently a bunch of economists has already pronounced its death (Krugman's "Tarnished Silver" is an example of these pieces.) I find these reactions premature and immature. The sample size of articles right now is very small - and they need time to find their audience.
The challenge for Nate is to figure out a balance between long articles based on thorough statistical analyses, and shorter pieces that get to the point quickly but still fundamentally driven by quantitative thinking. For example, I thought the piece about economic data useful for the average reader: the median tenure on the job going up does not necessarily mean workers are keeping their employment but can be a result of workers with less experience being selectively laid off is not something an average reader thinks about on an average day. Tyler Cowen (Marginal Revolution) snarked: "What it says is fine, but it won't interest me." But how many of Nate's readers are professors of economics?
Writing data-driven stories is a challenge that I am familiar with. In fact, some critics also think I have failed in this regard. The editor of Significance, in a generally favorable review, praises the front part of Numbersense, which is extremely restrained in the required mathematical background but considers the last two chapters of the book a "sudden death ending". Those parts on the other hand have been received favorably by data analysts. It's hard to satisfy both constituents.
I sense that Nate and I share a common mission, which is to show the general reader how to interpret data and data analyses. This means that what we show must be replicable by the reader. By this criterion, the analytical work would not be publishable in an academic journal. It might be a partial analysis, for example, based on a sample that is feasible for readers to collect, or a simplication of a model that is too complex to describe in 1000 words. We also assume that the general reader is not inclined to plod through academic journals looking for the supreme proof of something.
The big change to 538 is that it is now a team production. It was easy to keep a consistency of quality and a niche when Nate's the only contributor. Other writers have their own styles and inclinations. It remains to be seen whether this effort will develop an identity.
I fail to understand the complaints by the economists that the 538 pieces on economics are not informed by expert opinion at the same time that these economists profess to be fans of Nate's work in election forecasting, which is a field that also has a history of academic work. It sounds like "not in my backyard".
The Agilone link seems to be down.
Posted by: Dvmaster | 03/25/2014 at 04:26 PM
Dvmaster: Thanks for leaving the note. It's not down but I didn't type http:// and Typepad attached the wrong prefix to the link. I have fixed it now.
Posted by: Kaiser | 03/25/2014 at 04:29 PM
Kaiser:
Your last paragraph is interesting. One thought I have is that forecasting, in economics or politics, is (a) extremely important for practitioners and (b) of great interest to the general public, but is not so respected by academics. We (academics) can do forecasting, but the effort required to get the details of the forecast is more like gruntwork, and academics are generally more interested in formulating and testing research hypotheses. So a data journalist like Nate can make a real contribution. Even if the forecasts he does are things that academics could do and sometimes actually do do, we in academia are not always so good at publicizing the results.
Posted by: Andrew Gelman | 04/10/2014 at 05:10 PM