Like Andrew Gelman, I was disappointed with Steven Strogatz's recent column on Bayes Theorem for the New York Times. (Disclosure: I am a fan of Strogatz's Nonlinear Dynamics and Chaos book, and I'm pleased as punch that the New York Times is exposing readers to this type of scientific materials.)
Unlike Andrew, my displeasure has nothing to do with the fact that Bayes Theorem is behind the natural-frequency calculations. (Further disclosure: I used the same approach as Strogatz and Gigerenzer in Chapter 4 of Numbers Rule Your World, and in the Conclusion, connected the computation to Bayes Theorem. I didn't see it as "new thinking" though.)
I think the column exemplifies all that is wrong with the way our schools teach probability and statistics to nontechnical students. Much of the column is concerned with computation. The opportunity is lost on helping readers get behind the numbers, and interpret what's being computed.
If I were writing about mammogram screening in my book, or teaching it in my intro stats class, I'd say the following:
When asked about the accuracy of a mammogram, doctors cite the "false positive rate". Ignore the false positive rate, what patients really need to know is the "positive predictive value" (PPV), that is, the chance of having breast cancer given that one has a positive mammogram.
The PPV for mammography is 9 percent. Nine percent! You heard right. For every 100 patients who test positive, only 9 have breast cancer, the other 91 do not. This may sound like a travesty but it's easily explained: breast cancer is a rare disease, afflicting only 0.8% of women so almost all women who take mammograms do not have cancer, and even a highly accurate test when applied to large numbers of cancer-free women will generate a lot of false alarms.
The false positive rate is the chance of testing positive given that one does not have breast cancer. This number is irrelevant for patients because if you know you don't have breast cancer, you don't need to take the test! Doctors talk about it (they shouldn't but they do) because when they developed the mammogram, they needed to know how accurate the test is -- they needed to find a group of known positives and a group of known negatives, and then check if the test provided accurate results. Now that the test is rolled out and an annual rite for many women, doctors should be talking about PPV, not FP.
I don't see the point in dwelling on the A given B, B given A confusion. I'd just as well cover this by stressing the importance of asking the right question. Some students may find shocking juxtaposing the sensitivity of 93 90 percent (ability to detect positives) and the PPV of 9 percent. Again, this is easily explained since only 0.8% of those tested have breast cancer so however high the sensitivity is, it only affects a very small number of patients.
I'd stop there for Level 1. For more advanced students, there is much more to interpreting these numbers. Say:
This PPV of 9 percent sounds really low. Does this prove that the mammogram is worthless? Worse than flipping the coin? Well, the coin flip is not the right way to think about this. In the absence of the test, we know that 0.8% of women have breast cancer; given the test results, the PPV tells us that 9% of those testing positive have breast cancer so those testing positive are roughly 10 times more likely than average to have breast cancer. This demonstrates the value of the test. Is this high enough? That's for the experts to say but it is wrong to conclude that the test is worse than a coin flip.
Said differently, the mammogram provides valuable information. Prior to testing, we know X. A posteriori, we know more than X. Statisticians call 0.8% the prior probability of having cancer and 9% the posterior probability of having cancer given a positive test.
Despite this, why would doctors knowingly recommend a test in which 9 out of 10 positives will turn out to be false positives? To understand this, we need to know the "negative predictive value", that is, the chance that one does not have breast cancer given that one has a negative mammogram.
The NPV is 0.11 99.9%. That is to say, for those testing negative, they can be almost sure that they don't have cancer. It turns out that these two metrics (PPV and NPV) are linked; if doctors improve PPV, NPV must decline. What does this tell us about the incentives of those developing the screening test? Are they more afraid of false positives or false negatives? Why?
Now, we really get to what I consider the crux of the issue. The PPV and NPV reflect how the test has been calibrated. The calibration is only half science; the other half is human experience, politics, emotions, etc. I get deep into this stuff in Chapter 4.
In my view, classroom time (or reader time) is much better spent explaining these things than on the arithmetic of computing conditional probabilities. Knowing how to work the formulas but not noticing the real-life implications is a tragedy; knowing how to think about these tests but failing to compute probabilities is an inconvenience, and an excuse to befriend a quantitative mind.
Too often, we train our students to declare victory upon turning to the end of the textbook. And this is why I think Strogatz missed a great opportunity.