« Combining probabilities | Main | Know your data 37: one billion passwords »

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Dale Lehman

I agree with your characterization and the examples but I think the examples illustrate a different problem that you describe. The examples illustrate a problem of "asymmetric" information. The data being collected is sufficient to delineate duplicate from single transactions, but that data is not provided to the end user. I fear this problem will get worse, as we leave credit cards behind and move to smartphone based payments. Already, when I use my credit card to tap rather than inserting it, I find I am always worried that I might tap it twice. That information is being collected, but not as easy for me to access as for the merchant.

The title of your post suggests something different to me - as we have more and more data, we still find that the data we have is often not the data we want. This goes beyond asymmetric information. I think it concerns the divergence between what can easily be collected and the questions that are meaningful to ask. The important questions usually involve causality and variability, but the data that is collected often don't measure what we really want. I think this is a function of the ease of collection. We are inundated by what is easy to collect - but what we really need is often hard to measure and/or collect. The problem is that we often substitute the former for the latter, losing sight of the fact that data volume does not substitute for data quality.

Kaiser

DL: Great points. There are many paradoxes, which is why I used the indefinite article. And the paradox I focused on is explainable by "asymmetric information" that confers advantages to the side holding the data, as you said. What's unusual here is that the side that doesn't have the data actually produced the data (and should have been the rightful owner of the data!)

To add to your other point above, another reason why more data is worse is the normalization of "fake" data - by which I'm referring to things like the supposed measures of blood oxygen levels, and other unlikely health metrics on smartwatches. The actual data are what the watch directly can measure (not much); everything else are projections based on formulas - which shouldn't be treated as actual data, but modelled data.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name is required. Email address will not be displayed with the comment.)

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.

Search3

  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Numbersense:
Amazon - Barnes&Noble

Junk Charts Blog



Link to junkcharts

Graphics design by Amanda Lee

Community