« The language of data, the math of retail | Main | Between precaution and fear »


Feed You can follow this conversation by subscribing to the comment feed for this post.




I have to disagree a bit here. I really like your explanations but I do not think it is really one or the other - this explanation OR the computations with A, B etc.

An intro class is for a wide population of students. Some of these students will be using stats in their other courses quite a bit (and likely often in their working lives). Others are simply fulfilling a requirement and only really "need" the statistical literacy part.

Teaching to both audiences requires a good prof to go over the thinking behind the numbers AND teach the computations. You may say a student can find that in the text, which is true, but many have trouble understanding the textbook's explanation of these computations.

Mike Anderson

I hadn't thought about emphasizing NPV, but that's a great idea for demonstrating tradeoffs. I also have my stats students calculate PPV as a function of disease prevalence so they can see how it varies among risk groups.


I completely agree that your description ends up teaching poeple more about conditional probabilities... but...

From the "curious" readers point of view (opposed to the "intrested" reader) Storgatz opinion peice is a much easier read. Why?

Because he doesnt get into technicalities or new acronyms. He just shows the curious something they probably were doing badly... And that makes a much better read to most of his readers (and probably the editor as well, IMHO).

Now if what the average NYT reader prefers to read is what is best for him to read? that's a whole diferent question.


we need to know the "negative predictive value", that is, the chance that one does not have breast cancer given that one has a negative mammogram.

The NPV is 0.11%. That is to say, for those testing negative, they can be almost sure that they don't have cancer.

Sure that this is not a mistake? Maybe NPV is 100%-0.11%?


Tic: Great catch! I do mean NPV = 99.9%. The computation is:

P(test negative) = 0.8% x 10% + 99.2% x 93% = 92.336%
P(true negative / test negative) = P(true negative tests negative) / P(test negative) = 99.2% x 93% /92.336% = 99.9%


JW: in my view, there should be two tracks for introductory classes, one to prepare students for hands-on statisical analyses, the other to prepare students to practice statistical thinking in their everyday and/or work lives, accepting the fact that they would never do any hands-on work.

If we treat the amount of class time as a scarce resource, then there is a tradeoff between time spent teaching formulas and time spent teaching interpretation and reasoning. Unfortunately, it's a tough choice.

Tom West

The usefulness of a mammogram depends on what happens *after* the test. I assume the results show up some suspicously cancer-like tissue. If someone has a positive test, then the best bet would an additional, more precise test (such as a biopsy), to determine whether or not the person actually has cancer.
The big advantage of mammogram is that it is non-invasive, and have very few negative effects on the patient. So, patients have little to loose.

Any general screening program should have low costs, low risks to the patient, and a low false negative rate. A high flase positive rate or low predictive value is an acceptable price to pay, *providing* that any positive result is followed up with a more precise test.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep