« What is Numbersense about? | Main | Timid testers: evidence from a suppressed study »

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

zbicyclist

I've only skimmed the two articles so far, but I find it amazing that this is posed as some sort of debate. I suppose it is, but if statisticians abandon big data to the computer scientists the field is headed toward academic irrelevance.

Statistics is an applied field after all. An applied field that shrinks from the difficult questions involved in being applied stops being relevant to that area of application.

The analogy I'm using lately is that a statistic (like Stan Musial's lifetime batting average) is an object, like a bridge. A user of that statistic needs to know a few things in order to use it (like, don't drive off the side of the bridge). To design a bridge, you need a civil engineer (another type of applied mathematician) who can make the decisions about proper construction.

Even in the case of even a simple statistic like a batting average, there are questions of proper construction: do we include walks? what if you reach on an error? should doubles count more than singles? Proper construction of measurement is what applied statistics is all about -- not surprisingly, because a statistic is a measure.

Bridge technology has changed substantially over the past 300 years: there are new materials to make the bridges (no more iron trusses). There are new types of loads to be carried (the introduction of trains, the demands to carry heavy trucks, etc.).

While any individual engineer can (and should) decide that they are only going to deal with certain types of bridges, the field as a whole has had to broaden to deal with these challenges.

Similarly, individual statisticians can deal with big data or not, but the field as a whole shrinks its domain of relevance if the challenge of dealing with the new materials of big data are ignored. We will leave it to the computer scientists, the operations research practitioners, and the engineers.

zbicyclist

I've now read David Walker's piece (the other side of the argument) twice. His objection to statisticians becoming involved in big data is because they will be used to help companies: "Big data is principally about taking more money off customers by (let us put it perjoratively) more effective snooping on their habits."

That's a narrow definition of what marketing is about, but certainly marketing is at least partly about that.

Having spent my entire professional career in statistical applications in marketing research, I can tell Walker that this ship sailed long ago. Marketing research is by no means the valedictorian of the statistical school, but I'm not quite sure why Walker seems to be contending that we should be expelled, or at least wants to pretend we don't exist.

Walker does have some good points. I'm fully on board with his McKinsey bashing, and he does point out the vital point that "big data" and "open data" aren't the same thing at all.

Mathman54

I have not read the original article, so my comment is based on the excerpt.

Re automation: Reacting to every "significant" result as if it were a valid signal will lead to "tampering", as Deming would have put it. Given the number of tests which are or will be run, there will be many many instances when a low probability event will be judged "significant" and action will be taken. Often, the proper action to take is to take no action.

Dimitriy

Can you please explain the calculation that produced "10% progress that won the million-dollar prize is roughly worth one-tenth of one star on the five-star rating scale"?

JohnMashey

Say some more about why the marriage is pending.
Is that change of definitions, as some of this is pretty old.


I was at Bell Labs in the 1970s/early 1980s, and Murray Hill Bldg 5 had statisticians who spent their time analyzing telephone records whose volumes certainly fit Big Data for the time, even if it wasn't called that. The Bell System tracked every trouble report, down to thing like squirrel bites and gunshots, and later we did expert systems for rummaging the data and looking for patterns.

In the 1990s, Silicon Graphics was selling supercomputers to telcos and others, both for marketing analytics and fraud detection, both of which needed much statistical analysis.

For some history, see:
http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/?_r=0

http://www.slideshare.net/amhey/big-data-yesterday-today-and-tomorrow-by-john-mashey-techviser?utm_source=slideshow03&utm_medium=ssemail&utm_campaign=share_slideshow
See especially slides 22-24.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing and advertising analytics expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Next Events

Mar: 26 Agilone Webinar "How to Build Data Driven Marketing Teams"

Apr: 4 Analytically Speaking Webcast, by JMP, with Alberto Cairo

May: 19-21 Midwest Biopharmaceutical Statistics Workshop, Muncie, IN

May: 25-28 Statistical Society of Canada Conference, Toronto

June: 16-19 Predictive Analytics World (Keynote), Chicago



Past Events

Feb: 27 Data-Driven Marketing Summit by Agilone, San Francisco

Dec: 12 Brand Innovators Big Data Event

Nov: 20 NC State Invited Big Data Seminar

Nov 5: Social Media Today Webinar

Nov: 1 LISA Conference

Oct: 29 NYU Coles Science Center

Oct: 9 Princeton Tech Meetup

Oct: 8 NYU Bookstore, NYC

Sep: 18 INFORMS NYC

Jul: 30 BIG Frontier, Chicago

May: 30 Book Expo, NYC

Apr: 4 New York Public Library Labs and Leaders in Software and Art Data Viz Panel, NYC

Mar: 22 INFORMS NY Student-Practitioner Forum on Analytics, NYC

Oct: 19 Predictive Analytics World, NYC

Jul: 30 JSM, Miami

Junk Charts Blog



Link to junkcharts

Graphics design by Amanda Lee

Search3

  • only in Big Data

Community