Today, I return to one of the statistical issues that doomed the Stanford study that claimed the true infections are 50 to 85 times the reported cases. You can read the entire critique of that study here, or Andrew's more technical and longer debunking here.

Here's a summary: 97 percent of the residents of Santa Clara are believed to not have the antibodies for SARS-CoV-2, and so **even a small false-positive rate when multiplied by so many people will produce a large number of false-positive test results**. These false results vastly outnumber the true positive results since only 3 percent of the residents may have those antibodies.

Thus, the test needs to have a tiny false-positive rate, meaning it must have a high true-negative rate (technically called specificity).

The study's researchers estimated the true-negative rate by running the antibody test on 30 samples of blood collected prior to the advent of Covid-19. The test returned 30 negative results. Thus, they asserted that the specificity of the test is 100 percent.

Statisticians don't accept perfection. Even if you get 30 heads in 30 coin tosses, it is still possible that the coin might show a tail on the next toss. We consider what might happen if the specificity wasn't 100 percent but say, 98 percent. That 2 percent false positive rate turns out to make a mockery of the study's headline result.

Because almost all people do not have antibodies, a 2 percent false positive rate produces more false positive results than the 50 positive results observed in the 3,330 tests. So it is possible that all of the 50 results are spurious. Not a good look!

***

Today, I look at this same problem from a different angle. The researchers might want to stick to their assumption of specificity close to 100 percent. One way they can achieve this is by **enlarging the reference sample used to establish the specificity**.

It turns out that 30 samples are not big enough to be sure. **To really be sure that the specificity is at least 99 percent, they need 300 samples to all test negative,** 10 times more than they used.

***

Let me explain how I get to 300 samples.

First, assume the test has 100% specificity, that is to say, if the subject has no antibodies, the test will always come back negative. There is no chance of a false positive. In this case, if we apply to test to 300 pre-Covid blood samples, all 300 will definitely come back negative.

Now, let's say the test has 99% specificity. False positive is a possibility, albeit remote. The chance of 300 negatives in 300 samples is (0.99)^300 = 5%.

If the test has 98% specificity, the probability of 300 negatives is now (0.98)^300 = 0.2%. The lower the specificity, the lower the chance of all 300 samples coming back negative.

In statistics, we like to ignore "**rare events**". Rare events are usually defined as events that happen with less than 5 percent chance. We focus on things that happen 95% of the time.

So in the above scenarios, once specificity goes below 99%, the chance of seeing 300 negatives in 300 samples drops below 5%. Any event that happens with 5 percent chance or less is a "rare event" which we choose to ignore.

Thus, we say that when specificity is below 99%, there is essentially no chance of all 300 samples coming back negative. [ We say this with 95% confidence, meaning we allow ourselves a 5% margin of error. ]

In other words, if the researchers ran their antibody test on 300 pre-Covid samples, and got 300 negatives, then we can safely say that the specificity is above 99 percent.

***

How is 30 samples too small? We can go through the same analysis.

If specificity is 100 percent, then of course all 30 samples will come back negative.

How low can specificity go before the chance of all 30 negatives dips below 5 percent? Turns out this is 90 percent specificity. With 90%, the chance of 30 negatives in 30 pre-Covid samples is (0.9)^30 = 4%.

So **even though the test was perfect for 30 samples, the chance of a false positive is not 0 percent; it could be as high as 10 percent. A 10-percent false-positive rate is devastating to the Stanford study. Out of 3,330 test-takers, we expect 3 percent to be infected, so 3,230 people have no antibodies. Ten percent of these would nonetheless get positive results - that would be 323 false positives. But they only reported 50 positives so quite possibly all of them are incorrect!**

***

This is a great opportunity to discuss a common statistical problem. **What can you say when nothing happens in your sample?** For example, you flip a coin 10 times and you get all heads, or all tails (equivalently, no tails, or no heads).

In a nicer setting, we might get 4 heads of out 10 flips. Then, we say the coin has a 40 percent chance of showing heads. We can put a margin of error around that 40 percent number. (This is covered in every Stat 101 class.)

What if you got 0 heads out of 10 flips? Following the same method leads to the ridiculous conclusion that the coin has zero chance of showing heads (equivalently, 100 percent chance of showing tails.) How can you put a margin of error around that estimate of zero (or 100) percent?

Note that this is exactly what I did above. Instead of concluding that the specificity is 100 percent based on 30 samples, I'd say specificity is between 90 and 100 percent. With 300 samples, it's between 99 and 100 percent.

The logic I used above has been simplified to a "rule of three". Take 3 and divide by the sample size. For 30 samples, I have 3/30 = 0.1 or 10 percent. That's the size of the margin of error. So, for the all-heads case, the margin of error is 90% to 100% (width = 10%). For the zero-tails case, the margin of error is 0% to 10% (width = 10%). For 300 samples, the width is 3/300 = 1%.

Thank you, this is really interesting. I am quite often using the similar "one-way confidence bands" in my own professional work, when using sample date when all the sample observations either show a "Yes" or "No"

The "rule of three" introduced at the end could be very helpful for us. Do you have any links to papers or books where this is explained more in detail?

Posted by: Magnus | 04/26/2020 at 11:22 AM

Magnus: Wikipedia has a page on Rule of Three. It literally is the calculation I did above. The only thing is it applies a standard approximation for the log function to simplify the formula. This "zero events" problem has been studied by statisticians a lot - that's because it's a special case where classical statistics lead to an absurd answer.

Posted by: Kaiser | 04/27/2020 at 01:19 PM

There are two typos: in the following sentence 30 sould be 300:

The chance of 300 negatives in 300 samples is (0.99)^30 = 5%.

Even in the next one.

Posted by: Antonio Rinaldi | 05/13/2020 at 01:03 PM

AR: Thanks for spotting them. Fixed. Yes they should be 300 not 30 in those two formulas. The cutoff for 5% was correctly stated at 99% specificity.

Posted by: Kaiser | 05/13/2020 at 01:30 PM