You might find it curious, that someone who pointed out how testing drives the case numbers from my very first Covid-19 post (here), and who called for broad, non-targeted testing in this Wired column also from early in the pandemic (also here), has been silent on testing statistics. Specifically, I have not - until now - commented on the media's two favorite testing metrics: the number of tests per capita, and the "positivity" ratio, i.e. the proportion of tests that came back positive.
The reason? I quote from my post earlier in the week:
In the short history of the Covid pandemic, people started with case statistics. Then, they claimed that death statistics are less manipulated than case statistics, when they learned about testing. Then, they claimed that testing statistics are less manipulated, until they realized governments determined who got tested. Some governments counted tests shipped, not test results. Then, they claimed hospitalization statistics are less manipulated, until they learned that hospitals sent sick patients to nursing homes.
Commentators citing and interpreting these statistics behave as if they aren't manipulated. The President already contradicted them on that count. Manipulation is both intentional and unintentional. (I use unintentional intentionally to include intended unintentional.)
In my Wired column, I pointed out that one huge concern with non-random testing is it produces biased data that prevent us from understanding the pandemic. What makes this worse is when analysts speak about the statistics as if they came from randomized testing.
The positivity rate going up only tells you that of the people who were recently tested, more people have tested positive. If people who got tested were a random selection of the entire population, then yes, it indicates that the coronavirus would have spread more widely. A higher proportion of the population would have been infected.
This is not a given when the sample of people tested is non-random. Another reason why positivity rate goes up is more infected people are getting tested than before. (This is a broader statement than saying more people are getting infected.) Randomized testing almost never happens on its own - it must be organized. If the sample were truly random, you'd expect that someone who is not infected is equally likely to present themselves for testing as someone who is infected. Are you kidding?
Greater spread is one of several reasons for positivity rate to go up. Unless the analyst has examined the other reasons and found them lacking, then the conclusion is based on assuming greater spread, and assuming the other reasons do not apply. It is a tautology.
So, anyone saying positivity rate going up implies Covid-19 is spreading is doing some or all of: (a) assuming that Covid-19 is spreading, (b) assuming that the tested population is randomly selected, (c) assuming that the tested population has always been randomly selected [demonstrably wrong], (d) assuming that uninfected people are comprising a growing proportion of those being tested, (e) assuming testing data are comprehensive and accurate, (f) assuming no one is manipulating the testing data.
***
Regarding manipulation, we have indirect evidence, such as the U.S. President musing about slowing down testing. We also know that the UK government counted "tests shipped" rather than "tests results returned". This is like automakers counting "sell through" rather than "sell out". Sell through tallies include cars sitting in lots of dealers, and is part and parcel of the "channel stuffing" strategy to inflate short-term revenues. Sell out happens when drivers take ownership of the cars. Recently, the UK government stopped publishing testing numbers. In the U.S., I previously reported that California's testing numbers were fishy. I'm suspicious of testing data in the U.S. given the variety of public and private entities conducting tests (while being appreciative of those who put in effort to gather such data.)
The fact that few governments have implemented random testing, even to a small subpopulation, makes it clear that politics is interfering with the public health imperative. The side effect is suboptimal policies misled by low-quality data.
Recall what I said here: don't blame low-quality data for misguided policies when politicians have the wherewithal to improve the data quality.
***
For data scientists, this is yet another instance of garbage in garbage out. In an age in which the provenance of data is frequently unclear, it's worth our time to investigate and interrogate the data - and never assume random when we simply don't know.
Testing positivity rate is interesting and can be used for some broad "conclusions" to lead further investigation.
You don't need pure randomness, as there is likely a predictable order in which people have gotten tested.
If you have 10 tests, you're going to give them to the 10 most likely to have it or the 10 where the result is most meaningful. If you have 10 more, it'll go to the next 10 by the same criteria, etc.
So, if you increase large-numbers testing by another large number, you should expect positive test rate to decrease, ceterus peribus.
An increase in positive test rate does not mean that the virus is spreading, as it could be our testing strategy was flawed all along. Multiple different states/municipalities experiencing a similar result while having using differing prioritization strategies would strengthen the ability to conclude that the virus is likely spreading and results are not due to increased testing.
Posted by: Dan Vargo | 06/30/2020 at 02:55 PM
DV: Thanks for the comment. Self-sorting by severity is certainly a factor. This is one of those things that are simple conceptually but hard to measure. One measure is falling positivity rate, as you stated. But now we're using the same metric for cause and effect. I agree that increasing positivity ratio does not mean the virus is spreading - that's the reason for this post because the media seem to believe so. The biggest determinant of the positivity ratio is what types of people are getting tested, and how is this changing over time. After learning about the fiasco in California, my trust in testing data went to zero. So far, no one has provided a satisfactory explanation for the fiasco (see here.
Posted by: Kaiser | 06/30/2020 at 11:45 PM
There is an interesting consequence of initially poor testing, which is the number of daily new cases in some countries doubled a fast as every 2 days. This didn't mean a greater rate of infections, it was simply a result of catching up all the non-tested cases.
Posted by: Ken | 09/03/2020 at 09:18 PM
Ken: Deliberately introducing bias into data collection is never wise. A lot of schools are repeating this mistake. If they don't do comprehensive or random testing (with high compliance), they are fooling themselves... some readers think that is by design to collect tuition. You see, when they publish those biased numbers, they did not draw any generalizations; they don't have to because readers will do this themselves and get misled.
Posted by: Kaiser | 09/04/2020 at 11:39 AM