Coronaphobia has arrived in the U.S. This past weekend, people started raiding stores for masks and tissue paper.
Ultimately, the one important question we're trying to answer is:
Is COVID-19
(a) just another flu virus
(b) a particularly lethal flu virus that will have a short-term impact (similar to SARS, which killed about 800 people but did not last beyond one flu season)
(c) a deadly virus that would fuel a much-feared flu pandemic that could kill millions
This is the crucial question because its answer should condition our response to the virus.
(a) typical caution
(b) short-term vigilance
(c) large-scale mobilization and preparation.
Small Data Emergency
Because this coronavirus is new, and the disease is spreading while humans are playing catch-up, what we face is a classic "small data" emergency. We have very limited amounts of data (the known and suspected cases), incomplete data (under-reporting, possible hiding or manipulation), data of dubious quality (self-reporting, hastily assembled tests, unpreparedness), assumptions that later turn out to be false, the potential black-swan fallacy (failure to imagine what hasn't happened before), and non-stationary data (old data might not be wrong but become outdated as the virus evolves).
This emergency is typical of any disease outbreaks. This one is larger in scale, and international in scope. To understand the entire process of outbreak investigations, and the pressure investigators work under, I refer you to Chapter 2 of Numbers Rule Your World (link).
Before we complain about incompetence, confusion or muddled messages, we should have some sympathy for the epidemiologists, medical researchers, data analysts, policy analysts, public health advocates, and others who are doing their very best at a very challenging moment.
The Overdue Pandemic Backdrop
"Politicians can ignore it, and may even get away with it. But when a pandemic eventually strikes, and history suggests that one is long overdue, the consequences could be devastating." This is a quote from an article published by The Telegraph. In 2008, more than a decade ago.
Much of the medical community have been predicting a pandemic for years and years. They thought SARS was the big one but it wasn't - it did not pass from human to human quickly enough. According to the same article, WHO warned that the H5N1 avian flu could "seed a human pandemic that ... hospitalise 30 million, of whom some 20 per cent would die." That projection was way off the mark.
For a pandemic to occur, we need a mix of infection rate and fatality rate. Coronavirus has a much higher infection rate than SARS, and transmits between humans but it is less lethal. However, at this early stage, our estimates of the infections and fatalities are far from accurate.
I've put together a list of unanswered questions. Whether COVID-19 is another false alarm or not depends on what answers we eventually receive.
1) Test accuracy affects the infection count
All tests currently in use are rush jobs which obtained "fast-track" approval. China accepted seven tests within the first two weeks of the outbreaks. These tests use the "nucleic acid" method, which amplifies genetic materials collected from throat swabs, and according to SCMP (the paper of record in Hong Kong), an official later said they were only "30 to 50 percent accurate".
The low accuracy rate should not surprise. Fast does not produce accuracy; in fact, fast probably means less reliable. Our knowledge of the new virus is very limited in these early days.
It's not even easy to tell when a test is inaccurate. Many anecdotes circulate of patients who initially tested negative. The media immediately screamed "false negative"! But it's not that simple. The patient could have caught the virus between tests, especially if s/he is quarantined or living in proximity of someone with the virus. Or the virus may be mutating.
Scientists are negotiating a vicious cycle of inaccurate tests, and inaccurate measurements. A BBC report describes these challenges.
To learn more about diagnostic testing, see Chapter 4 of Numbers Rule Your World (link), in which I look at how the sports world tests for performance-enhancing drugs. The trade-off between false positives and false negatives is a key issue. I also make the distinction between a "chemical" negative and a "real life" negative. An athlete might test negative in a doping test but could still be a doper because the test does not detect the substance s/he is using, or is tricked by a masking agent, or is ambushed by an evasive action. So, it's possible to have a test that is accurate for what it's designed to do but still misses what we'd like to know in real life.
2) The elephant in the room: people are counted as infected only if they are tested
The big catch: someone can be counted as infected only if the person is tested for coronavirus. No test, no reported infection. This ABC News report states that South Korea has tested tens of thousands of suspected cases while the U.S. has conducted fewer than 500 tests, as of February 27, 2020. This means the number of infections in South Korea is probably over-stated (see point #1), and the number of infections in the U.S. is under-stated.
If we believe we face scenario (a), then not testing makes sense. It's simply not useful to know how many have the common flu viruses at any given time. But if we are in scenario (c), then stopping the spread of the coronavirus is paramount, which requires knowing who has it.
3) Noise coming from common flu viruses
Remember we are in flu season. According to the CDC, influenza and pneumonia causes 56,000 fatalities in the U.S. each year, the eighth leading cause of death. The fatality rate is 17 per 100,000 or 0.02%. This implies an annual infection count of 327 million cases of the common flu (despite the widespread use of the flu vaccine). On average, that's almost 2 million infections a day if we assume the flu season lasts 6 months (the bulk of infections actually occurs between December and March).
Against this backdrop, it is easy to attribute infections or deaths wrongly to COVID-19. The use of indirect tests increases the chance of mis-attribution.
For example, CAT scans find evidence of lung damage but using such tests can make infection counts inaccurate in two ways: lung damage can be caused by other viruses especially in flu season, and not all people with COVID-19 suffer lung damage.
Partly as a reaction to the low test accuracy and/or desire for speed, the Chinese authorities switched the definition of an infection, now counting people with symptoms without waiting for a test confirmation, according to this BBC report. This change in definition led to a spike in "infections" but the new count probably contains more false positives while having fewer false negatives.
Inflating the infection count can be justified by the desire to minimize false negatives, which reduces the chance of people unknowingly spreading the virus around. Overcounting infections, however, leads to an under-estimate of the fatality rate.
Similarly, how are deaths confirmed? Are some of these deaths caused by other flu strains?
4) The connection between infections and deaths
The following news item from Twitter is typical of the media's set narrative on the coronavirus:
The two statistics they focus on are the number of infections (cases) and the number of deaths.
Those two numbers are linked. If the number of infections are growing fast, and deaths are not increasing proportionately, the fatality rate is coming down, so that should be a reassuring sign. The fact that both infections and deaths are growing doesn't mean anything. It's the growth of one number with respect to the other.
5) The survival rate
One thing that the media refuse to cover is the survival rate. COVID-19 does not cause certain death, far from it. People have recovered. Because it may take a couple of weeks or longer to be cleared, the count of survivors lags. Our estimate of the survival rate improves over time as more people exit the treatment period.
6) Intermediate metrics
Another key metric to track is the severity of the symptoms. Among the group of patients who are still surviving, some are mildly sick while others are seriously ill. The media also fail to report on any intermediate metrics.
7) Rates not counts
In all of the above, I talk about rates not counts.
Counts are misleading. If we take the example of the Twitter news item and apply its logic to the common flu viruses, then we'd have to report that during each day of the flu season, there are two million new infections, and 300 deaths! Sounds scary but not really.
[See comment below. Counts are relevant for logistical use. The number of patients requiring hospital stays certainly can overwhelm available resources.]
8) Community transmission
SARS has a high fatality rate, close to 10 percent of those infected died. However, the virus never quite mastered human-to-human transmission and because the infection rate was low, it eventually failed to become a pandemic.
Current estimates of the fatality rate of COVID-19 is under 2 percent on the low side and maybe 4 percent on the high side. This fatality rate is much lower than SARS but the infection rate is clearly higher, and it's clear that the virus can spread among humans.
***
It all comes down to infection rate and fatality rate. The trouble is we have "small data" so our estimates of those rates are limited. Knowing the sources of inaccuracy is helpful to understanding which direction the rates are moving. Also, secondary metrics like severity and survival rates provide some complementary information while we wait for more data to arrive.
Don't panic, but don't take unnecessary risks.
P.S. [3/9/2020] Some additional comments given new developments since this post was written
South Korea and Italy are both spoken of as hot zones for COVID19. This is true in terms of known infections but remember point #2, the number of reported infections is highly correlated with the number of tests performed. As of yesterday, New York City said it did less than 100 tests. In South Korea and Italy, they did tens of thousands of tests.
It also appears that the hospitalization and death rates in different countries are different. So there are even more variables, due to factors such as availability of medical facilities, quality of care, age distribution of patients, and so on.
Recent Comments