The New York Times Magazine has a pretty good piece about the use of OCCAM data to solve medical questions, like diagnosis and drug selection. I'm happy that it paints a balanced picture of both the promise and the pitfalls.
Here are some thoughts in my head as I read this piece:
- Small samples coupled with small effects pose a design problem in traditional clinical trials. The subjects of the NYT article claim that OCCAM data can fill the void. If a treatment is highly effective, even small clinical trials will find the effect. So the underlying issue is less sample size than effect size.
- Counterfactual evidence is almost always absent from OCCAM data because of lack of controls (the first “C” in OCCAM). The lede in the story concerns a girl who was given an anti-clotting drug because a doctor suspected she had elevated risk of blood clotting, and the girl did not develop a clot. Statisticians are not impressed by such evidence, because we don’t know whether the drug was truly responsible for the outcome. (It's a correlation until proven guilty.) If the girl had not taken the drug, would she have developed a clot? This point is argued in the article by Chris Longhurst: “At the end of the day, we don’t know whether it was the right decision.” This ignorance puts us in a dangerous territory, making it a challenge to tell apart the prescient from the charlatan.
- The Big Data world is filled with "events data". You have a log of everyone who clicked on a particular button, or a log of everyone who called your call center, etc. You only have the cases but not any non-cases (e.g. the unhappy customer who did not call the call center). Heartwarming stories like the girl's avoidance of clotting get repeated (or become viral, in modern terminology) but stories of failure are not usually deemed worth reporting. The following table shows four possible stories:
The media imposes a filter so that only the one story will get through. Without mentally accounting for the other stories, one can't judge how important the reported story is!
In the July issue of Significance, the magazine by RSS and ASA, Julian Champkin contributed a great profile of Iain Chalmers, the founder of the Cochrane Collaboration, the organization that aggregates and summarizes trial results. I saw this fantastic quote, which speaks to the New York Times article:
Dr. Spock’s 1946 book, Baby and Child Care, was ... read by a huge proportion of [parents around the world]; throughout its first 52 years in print, it outsold every other book except the Bible. “It recommended that babies should be laid to sleep on their stomachs. Now we know that doing that increases the risk of cot (crib) death. Tens of thousands of babies died needlessly because of that advice.”