We are now experiencing the aftertaste of targeted (aka triage) testing, a policy that has unsavory consequences that were predictable (link). Targeted testing is the idea that only people with severe symptoms or at high risk of infection should get tested for the novel coronavirus. Compared to a broad-based testing plan (whether comprehensive or randomized), targeting economises the number of test kits. The basis of any targeting strategy is the skill to predict who is infected. The group that is targeted should have a higher infection rate than the group that is not targeted. It's important to bear in mind that targeting not only suppresses how many people are tested but also affects who gets tested.
One way in which targeted testing hurts us is that it ruins our datasets from which we obtain insights about the pandemic. This post is prompted by Ken's comment on my previous post on casual presumptions of randomness:
The number of daily new cases in some countries doubled as fast as every 2 days. This didn't mean a greater rate of infections, it was simply a result of catching up all the non-tested cases.
The connection between infections, tests, and cases is complicated, and the media shift back and forth between the rate of cases (positive test findings) and the positivity ratio (proportion of tests that came back positive). I spent some time explicating these relationships, and hope that this post will succeed in helping you gain some intuition around those concepts.
We start by decomposing the positive test findings. In the flow chart below, starting with the dark gray box on the left side, we examine a fixed population of people. At any given time, the people are divided into three "compartments," the infectious people who currently have the virus, the susceptibles who have not been infected, and the recovered (which by convention, includes those who died from Covid-19).
Also by convention, anyone can be infected only once, and we thus ignore recovered people from further consideration. From the dark gray box emerge two paths that terminate at positive tests: the upper path consists of infectious people while the lower path, susceptibles. Some portion of the population are infectious; within that, some proportion have been tested for Covid-19, and within that, some proportion have tested positive. This last tally is one part of the confirmed cases. The other part of the confirmed cases comes from susceptibles, represented by the lower path.
The number of confirmed cases is the number of positive tests. This is a key metric on all Covid-19 dashboards. The other important statistic is the positivity ratio. This is the number of positive tests (represented by the dark red box on the right) divided by the number of tests conducted (the sum of the two yellow boxes).
The next diagram describes the drivers of the two dashboard statistics:
- the weights of infectious or susceptible compartments in the population (excluding recovered)
- the probability of getting tested among the infectious or susceptible group
- the weights of infectious or susceptible people within the tested subpopulation
- the positivity ratio for the infectious or susceptible people who got tested (call these the group-level positivity ratios to distinguish from the overall positivity ratio)
The overall positivity ratio is affected by three signals. First up is the rate of infection in the population, reflected in the relative weights of the I and S compartments. Second, the testing regime imposes a selection effect on these weights, producing the second set of weights. The notorious triage or targeted testing protocol restricts testing to patients showing severe symptoms or at highest risk - these disproportionately arise from the infectious group, and so the second set of weights are more skewed toward infectious people, compared to the first set of weights. Thirdly, test accuracy affects the number of positive test results, given any tested subpopulation.
Test accuracy is typically measured by sensitivity and specificity. On the upper path, the positivity ratio of the infectious-tested group is just the sensitivity, which we assume to be 90 percent. On the lower path, the positivity ratio for the susceptibles-tested is the rate of false positives (as this group is not infected), which is 1 minus the specificity, which we assume to be 5 percent. (For the sake of this post, we assume that the performance of the diagnostic tests has not changed during the pandemic and thus the group-level positivity ratios are fixed. This simplifies the following examples.)
What's special about 5% positivity ratio?
The media keep reminding us that the CDC wants the positivity ratio to fall below 5%. What is the significance of the 5-percent threshold? I can explain using the flow chart.
We hope for the epidemic to dissipate. When this happens, the infectious compartment dwindles to zero while the susceptibles stabilize. This means there will be no more flow through the upper path. All of the positive test results will come from the lower path (and are false positives).
The overall positivity ratio is a weighted average of the group-level positivity levels, the weights being the proportions of those tested who are infectious or susceptible. In this extreme case, the weight is entirely skewed towards the susceptible-tested, for which the positivity ratio is 5 percent. Thus, the overall positivity ratio will be 5 percent.
In other words, when the positivity ratio dips to 5 percent or below, new infections should be dying out.
What's the relationship between rate of cases and rate of infections?
I will now use this setup to investigate Ken's comment. He was reacting to the naive interpretation of case growth as infection growth, popularized by the media. Such a conclusion is invalidated because of targeted testing.
Note that infection growth cannot be directly observed. By contrast, case growth is reported on every dashboard. In Ken's scenario, the number of cases (i.e. positive test results) are growing rapidly.
Start from the dark red box on the right of the flow chart, and work backwards. The jump in positive cases may come from either of the two paths.
First, consider the lower path representing susceptibles. During the course of an epidemic, more people get infected, which reduces the number of susceptibles in the population, which in turn decreases the number of susceptibles testing positive. So if the cases from this lower path have to go up, then it must be that susceptibles are suddenly more likely to get tested.
One such scenario is if politicians expand targeted testing to those with mild or no symptoms. This is Ken's hypothesis.
Such a policy shift also affects the upper path. In fact, infectious people are more likely to experience mild symptoms. Even if the rate of infection is stable, the number of positives tests arising from the upper path grows. If infections are also accelerating, both factors contribute to case growth.
Ken argues that the entire case growth can be attributed to the expanded testing capacity while the media generally interprets the case growth as infection growth, ignoring possible shifts in testing policy.
The trouble is that both factors are at play, and we don't know the relative sizes of their contributions. This confounding is brought to you by targeted testing! (I can't resist: it's free and comes with a hidden cost.) This example demonstrates the aftertaste of targeted testing. It messes up the data so we can't interpret them properly.
It doesn't appear that Ken's scenario is happening in the U.S. as the federal government has continued its misguided targeted testing policy. Some politicians even grumble that we are conducting "too many" tests. (This can only happen if we were conducting too few tests, and thus under-reporting cases early on.)
How does targeted testing affect the positivity ratio?
In Ken's scenario, he assumes that the increase in testing accounts for the entire jump in cases. So, let's stipulate that the rate of infections is stable, and look at how the testing policy affects the positivity ratio.
The overall positivity ratio is the weighted average of group-level positivity ratios, so the key is to understand what happens to the second set of weights, which describes the mix of infectious and susceptibles in the tested subpopulation.
Under a randomized protocol, the second set of weights just represents the underlying mix of infectious and susceptibles since both groups have equal probability of getting tested. Under a targeted protocol, the second set of weights is biased toward infectious who are more likely to experience symptoms. Thus, early on, the positivity rate is over-stated. If targeting restriction is relaxed, the bias toward infectious is reduced. Thus, the relative weight of susceptibles-tested is pushed up, and so the positivity rate is driven down.
Since the rate of infections is assumed stable, the shift in positivity rate tells us only about the testing policy. We've just discovered how politics enter the data.
***
Targeted testing introduces bias into the data, which pollutes the link between the rate of cases and the rate of infections. Remember: we use the rate of cases to infer the rate of infections because in the real world, no one knows the rate of infections.
Targeted testing is the aftertaste that you can't seem to get rid of. It polluted the data when first implemented. Then, every subsequent change in policy, and the gradual behavioral adaptation to these changes, add to the confusion. The effect on number of tests is observable but the effect on the types of people taking the test is not.
This tragedy on the national level is now being replicated on individual school campuses. Many schools are doing targeted testing only on students who self-report symptoms. Their dashboards are reporting confirmed cases, the trend of which is being interpreted as the trend of infection. What is the point of data-driven decision-making when the data are known to be bad?
Comments
You can follow this conversation by subscribing to the comment feed for this post.