The last week saw more developments that show yet again why we need to statistically adjust data and throw away the raw data.
For some time now, we know that Covid-19 case counts have been artificially suppressed by restricted testing. The number of people infected with the novel coronavirus is vastly higher than the reported count. This leads people to conclude that the case-fatality rate (CFR) has been over-estimated because we should be dividing by a larger number of cases. So, the disease isn't as scary as once thought.
I pointed out that the news isn't all good. As the CFR falls, the rate of infection goes up. Both rates are affected by the number of reported cases. We have thus underestimated how widely the coronavirus has spread.
Last week, we received even more bad news. We learned that most (if not all) countries have under-reported the deaths due to Covid-19. (The Economist article linked to in my previous post is the best source for this.) Officials have only reported deaths from hospitals when many died at home or at care homes. Also, they require a positive test result to confirm a Covid-19-related death when few tests were conducted.
The adjustments to the fatality counts are huge, in the order of 50% to 100%.
Ouch! What does this do to the CFR? Double the deaths mean double the rate.
***
What might be the next shoe to drop?
Test accuracy.
On this, we have some initial reports of concern about "false negatives". For example, see this Bloomberg article.
I don't know that we have definitive knowledge yet of the accuracy statistics. But let's assume for the sake of argument that the COVID-19 diagnostic test has high false negative rates. This means a good number of people who are infected are mis-reported as not infected.
Correcting such a misclassification raises the number of infections as well as the number of deaths since both would be under-reported.
A double whammy! The rate of infection will have to be adjusted upward again. The CFR will also likely be adjusted upward.
***
Basically, any analysis that doesn't try to compensate for these inaccuracies is seriously flawed. Compensating for them requires making assumptions because we can't change the past. Since we didn't test those people, we would never know if they were positive.
With stay-at-home rules, a lot of us are cooking. So let me use a cooking analogy to describe the statistician's choice. You picked out a complicated French recipe for orange duck. When you start preparing the ingredients, you notice the duck breast has gone bad.
The statistician's choice would go like this:
(a) ignore the stench of the duck gone bad, follow the recipe, and pray that no one notices the stink
(b) run to the store next door, and, knowing they don't sell duck, buy their nice fresh chicken, substitute the meat in the recipe, and hope that the change doesn't ruin it
I always choose (b). But there are those who choose to hold their noses and do (a). Just like in the kitchen.
Comments
You can follow this conversation by subscribing to the comment feed for this post.