One of the challenges in the age of big data is having too much information. Information paralysis. Not enough time to look at all the data. Too many contradictions in the data. And so on.
***
But there is something else that's been bothering me. The more information is collected, the less data we have at our disposal. This is a paradox of the big data age. What many of us face is not actually too much information but not enough data, and not the right data.
I have many examples of this paradox. Let me focus on one recent experience.
Recently, I tried using the OMNY card to ride the New York subway. This is a new card technology that will eventually replace the iconic swipe cards. It's a chip card that you place on a screen and if ithe transaction succeeds, the screen's border lights up green and you walk through the turnstile. The first time I used it, the green light didn't come on so I attempted again, and then the green light appeared.
At that moment, I realized that I wasn't sure if I was charged once or twice. Do I know fore sure that the first transaction didn't succeed? Other than the green signal, the new technology gave me zero information.
Contrast this with the existing swipe cards: for every successful swipe, the display indicates how much was charged, and how much stored value still remains in the card. The former confirms the transaction while the latter helps me plan when to refill the card's value store.
With the OMNY tech, I just have to "trust" the system.
The next day, I transferred from the train to the bus. According to the rules, the second leg should be free, since the transfer happened within a few minutes. With existing swipe cards, the display confirms immediately that I have used a free transfer.
With OMNY, I had no idea: the screen sent the same signal – green light – as for a paid trip.
I later checked the stored value on the OMNY card. I was indeed charged twice on that first trip while the transfer was not charged as expected.
***
What's the paradox? A lot more data are being collected by new tech, and yet users are provided less information than before.
This isn't hard to explain though. Businesses are aware that information has competitive value, and they are hoarding the data, even hiding them from the data's owners (you and I).
I agree with your characterization and the examples but I think the examples illustrate a different problem that you describe. The examples illustrate a problem of "asymmetric" information. The data being collected is sufficient to delineate duplicate from single transactions, but that data is not provided to the end user. I fear this problem will get worse, as we leave credit cards behind and move to smartphone based payments. Already, when I use my credit card to tap rather than inserting it, I find I am always worried that I might tap it twice. That information is being collected, but not as easy for me to access as for the merchant.
The title of your post suggests something different to me - as we have more and more data, we still find that the data we have is often not the data we want. This goes beyond asymmetric information. I think it concerns the divergence between what can easily be collected and the questions that are meaningful to ask. The important questions usually involve causality and variability, but the data that is collected often don't measure what we really want. I think this is a function of the ease of collection. We are inundated by what is easy to collect - but what we really need is often hard to measure and/or collect. The problem is that we often substitute the former for the latter, losing sight of the fact that data volume does not substitute for data quality.
Posted by: Dale Lehman | 07/07/2024 at 09:45 AM
DL: Great points. There are many paradoxes, which is why I used the indefinite article. And the paradox I focused on is explainable by "asymmetric information" that confers advantages to the side holding the data, as you said. What's unusual here is that the side that doesn't have the data actually produced the data (and should have been the rightful owner of the data!)
To add to your other point above, another reason why more data is worse is the normalization of "fake" data - by which I'm referring to things like the supposed measures of blood oxygen levels, and other unlikely health metrics on smartwatches. The actual data are what the watch directly can measure (not much); everything else are projections based on formulas - which shouldn't be treated as actual data, but modelled data.
Posted by: Kaiser | 07/09/2024 at 02:35 PM