All around the world, people are getting very afraid of the coronavirus. What's happening now is a live version of a disease outbreak investigation, which is discussed in Chapter 2 of Numbers Rule Your World (link).
To understand what's going on, you should see this as the exact opposite of "big data". We have a new flu virus. There is almost no direct data on this virus. The data are dribbling in as cases are being reported. We have a couple hundred deaths.... in a country with almost 1.5 billion people. Public health officials must make key decisions now before more data become available. But we have already passed the most dangerous phase, because they already found what causes the disease.
I saw a graphic last night, shown during an interview with an expert who investigated the SARS epidemic. Supposedly the mortality rate for the coronavirus is about 3 percent. This number has a large margin of error at this stage because of the shortage of data. This shortage is due to the newness of the phenonemon.
The closest relative to the coronavirus is the SARS virus which had a mortality rate of about 10 percent, which is three times higher than what we currently know about the coronavirus.
We also know that the coronavirus spreads faster than SARS. Most of the news reports I read mention the number of infected but rarely the number of deaths. This is a major oversight.
What's fearful is a deadly disease. A disease that infects lots of people but acts like the common cold is not a public-health crisis. So as data roll in, we should keep an eye on the mortality rate.
The mortality rate is the number of deaths divided by the number of infected. Both counts are growing each day. If the growth of infected is faster than the growth of deaths, the mortality rate will go down (not up!). The mortality rate will go up only if the growth rate of deaths exceeds the growth rate of infected.
In fact, by the time I finish this post, the latest report said the coronavirus death rate is 2 percent, so it has already been adjusted downwards because the flu is spreading fast. (The death rate of regular flu is about 1 percent, according to the graphic that was shown last night. This level acts as a control for interpreting the mortality rate.)
That report contains a typically alarmist quote by a medical professional: "There are likely to be many times more cases in Wuhan than officially confirmed". Okay, but if that is true, the real mortality rate is many times smaller than what is official! In terms of missing data, it's the unreported deaths that are more significant than unreported infections.
Also note the network effect. It's very unlikely that the growth rate of deaths can be higher than the growth rate of infected because infection grows exponentially due to transmissions. Deaths don't transmit.
It would also be useful to know what the cure rate is - because some of the infected will be in the treatment stage. During the report last night, the journalist wrongly asserted that 97 out of 100 infected will survive. This interpretation is correct only if we exclude those infected that are currently in treatment. (The technical term is right-censored data. We don't yet know how many in this subgroup will survive or not.)
While one should be cautious because this coronavirus is new, in times of paranoia, one should seek out the data. Ignore the unfounded rumors, innuendo and folklore. Stick to the facts. So far, the data aren't that alarming.
***
I n Chapter 2 of Numbers Rule Your World (link), I describe two modes of statistical modeling. Disease outbreak investigation illustrates one mode, in which the chief concern is establishing cause-effect relationships. This is always hard if you have observational data - without controls. But it's even more challenging in these crisis situations, with small drips of data. If you understand what statistical tools are available, and what the limitations are, you have a better grasp of what's going on right now.
Image credit: By https://www.scientificanimations.com - https://www.scientificanimations.com/wiki-images/, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=86358105
P.S. [2/3/20] See also my post on Junk Charts about a graphic used by the New York Times in a report on the virus.
Recent Comments