Big data is not about the amount of data. Big data is about the availability of data. And more and more, it is about people preying on consumer's desire for data. I hate to say it but data vendors run the risk of becoming new-age used-car salesmen.
The Wall Street Journal hyped a research article about mobile apps that supposedly "detect skin cancer". (WSJ link, article link). While the tone of the article is quite balanced, I cringed when the reporter wrote: "the best-performing app accurately identified cancerous moles 98.1% of the time."
First, let's define accuracy. Screening tests (whether this is cancer or steroids or anything else) have many different definitions of accuracy. Is it that of 100 images of moles that are malign, this app correctly identified 98 of them? Or is it that of 100 images of benign moles, the app correctly identified 98 of them? Those are two very different rates. Or is it that the accuracy is aggregate of malign and benign moles?
You won't find that information in the WSJ article. In the research piece, you will find that 98% is the "sensitivity" meaning that it predicted 98% of the malign moles correctly. This number tells you nothing about how the app does in terms of the benign moles.
Those who have been reading this blog hopefully will immediately wonder... if the app produces few false negatives, would it produce lots of false positives? It's almost guaranteed because there is a trade-off between those two types of errors.
Another question you might have is: assuming the app tells me the mole is malign, what is the probability that I have skin cancer? Notice this is the reverse of sensitivity. Sensitivity is the probability that the app tells me the mole is malign assuming that I have skin cancer.
Sorry to pop the bubble. The so-called positive predictive value is between 33 and 42%. This means that of those people whom the app claims have skin cancer, less than half of them actually does.
The 98% number is pretty much useless. It's the 40% number that we need to be worrying about.
There are several other red flags.
Accuracy rates (whichever one you're looking at) come from a comparison of the app's prediction to some "truth". What is this "truth"? You might think it is a confirmation of someone having skin cancer, and you'd be wrong. The "truth" is prior diagnosis by a dermatologist. So accuracy is really saying whether the app agrees with that dermatologist.
How does the best-performing app come up with the diagnosis? It "relied on doctors, forwarding images to board-certified dermatologists for review at a cost of $5 per mole." So, the accuracy rate is in effect a consistency rating of one set of dermatologists and another set of dermatologists.
The sample of images used for this evaluation consisted of 60 malign moles out of 188 total, so approximately one-third of the moles are malign. Now, flip to reality. Of all the people who download this app--and there would be many given this mass-media coverage, what proportion of the moles being diagnosed would be malign? Surely not anywhere close to 33%. Probably much, much lower. The 98% sensitivity only applies to the small proportion of moles that are actually malign. If that proportion is not 33% but much much lower, then the real accuracy of these apps would be much, much lower.
When I scanned the research piece, I also noticed this qualification: "We reviewed a total of 390 images for possible inclusion in this study. We excluded 202 as being of poor image quality." I hope that these apps do not charge people $5 to process images deemed to be poor quality. Because if they do, then the results of this study become even more meaningless.