It's ever more important to apply your numbersense when you read the news about Covid-19. This past week, many in the media continue to obsess over horse-race metrics, i.e. current counts of cases and deaths. Those happen to be the key data being released and updated by governments around the world, but raw data rarely tell the full story, and in this case, they mislead.
I already wrote about the problem with missing data that have caused foolish decisions to move to "safe" places. (Is it safe or do they not test?) You can't compare the growth curves of U.S. with China, or South Korea, or even Italy with a straight face. The elephant in the room is the extent of testing. If you test less, you under-report infections. Not enough testing under-estimates the infection rate but exaggerates the mortality rate. Under-reporting the infection rate is a risky business - it makes people complacent.
This past week, the news headlines have been blasting: Italy breaks records in number of deaths. This is not fake news but it is misleading. The message sent to readers is that the situation is spiralling out of control in Italy. But more new deaths does not mean things are getting worse! New deaths is irrelevant to understanding whether the containment measures in Italy is working or not.
The number of deaths is a lagging indicator. We can predict the growth in deaths based on the reported growth in cases in the last week. If there was a surge in cases, there will follow a surge in deaths... after a lag. When I hear new deaths went up, I think new cases went up last week. It provides little new information.
***
Is there a better answer to whether containment measures are succeeding?
First, we can zoom in on Lombardy (Lombardia), the region where the initial outbreak hit. Look at the growth of cases over time. The raw data can be found on this site (the green line is Lombardy).
Growth Rate is Not Exponential
The first surprise I got from looking at this data is that the green line is not exponential. An exponential rate can be expressed as something like "cases increase 10-fold every X days". The actual data aren't quite that bad! In math terms, the rate of growth is polynomial. It's still greater than linear but not as bad as exponential. Linear growth can be described as "cases increase by 10 every X days". [Note: Ten and ten-fold are hugely different rates, leaving a huge space in between.]
The chart below shows the data from February 25 to March 19, 2020 fitted to three rates of growth (quadratic, cubic and exponential). The exponential curve started to bend away from the actual case counts, and the magnitude of the over-estimate will keep growing over time. It's just the wrong shape to use here. The cubic and quadratic curves (both polynomials) are both plausible, the former being a tad better.
It Fits the Past but does it Predict the Future?
Now, take a step back to March 8, when the government decided to lock down the entire Lombardy region; some counties have already been taking actions prior. Could it be that at that moment in time, the data showed an exponential rate of growth? Did the shape change after the lockdown?
In the next chart, I used the data up to and including March 8 to fit the three rates of growth (quadratic, cubic and exponential). Then, for the period March 9-19, the curves show what the case counts would be given the assumption of that rate of growth (to be exact, it's the shape of the growth curve). In other words, the lines after the lockdown are forecasts that can be compared to the black dots which represent the actual case counts.
You can see that the exponential curve quickly leaves the data (black dots) behind. The actual growth in the cases is far below what an exponential rate would have predicted. Here, you see that the cubic is actually the best fit.
[For the nerds here, the two quadratic curves are different. In the first chart, the quadratic curve is fitted to all the data. In the second chart, the quadratic curve is fitted to just the pre-lockdown data, and then extrapolated. This is an example to show why predictive power is important to measure model fit. Quadratic fits in both cases have great R-squareds but that doesn't mean it is the right model.]
So we have both good and bad news for the italiani. It says that the growth is not as frightening as the media breathlessly reported. But it also says that the curve hasn't flattened (which is why the Italians just announced they would send the Army to enforce the lockdown.)
***
A Glimpse of Hope?
However, an analysis like the above is self-fulfilling. The model does not allow for the shape of the growth curve to change over time. I just assumed one shape which means the curve could not flatten under this analysis!
To escape from this dead-end, I looked at just the period after lockdown. The following chart shows the case counts from March 9 to 19.
This is when you can say "Andra' tutto bene!" In the last 10 days, the best-fitting model is linear. (Given margin of error, the quadratic curve can still be possible. But certainly not the exponential!) In this sense, you can say that the measures taken by the government in Lombardia has flattened the curve. The shape of the growth has been dampened from more like cubic to more like linear. That is an achievement, although the battle hasn't been won yet.
[Similar to the above analysis, we would need to observe the trend for another week to know if this linear model has predictive power.]
***
Meanwhile, we have also some anecdotal reports from specific towns, showing that containment and testing strategies have worked. (Lodi, Bergamo, Vo)
Of course, other indicators should be analyzed to paint the full picture. Also, the other regions are lagging so the hope is that by taking steps earlier, their curves would look milder than Lombardia's.
The point is that new deaths breaking a record is not a useful indicator of anything (other than the fact that infections broke records the week before).
P.S. Credit to JMP software for making it so easy to analyze the visualize the data. All the data processing, model fitting, visualization in one place.
"Exponential" is an imprecise term, in that the exponent isn't specified. Is it a doubling every day? A tripling? 50% growth?
Overall growth of total cases in Italy has been at about 14% for the past 6 days. IOW, the exponent is 1.14 That's not as high as it was, but it's still exponential (and too high). Hopefully the lock-down breaks this soon.
You're looking at Lombardia, which is fair, but a look at other regions will reveal other curves that fit better than the linear one does.
(You know all this, of course.)
Posted by: John | 03/21/2020 at 06:38 PM
John: thanks for providing this color because I have to make choices about how deep to go in the blog post without losing people in the weeds. The other important point that is implied is that even if one insists on using an exponential model, one can't fit an aggregate rate to all the data you have up to now. Such a model will not pick up any changes in rate due to success of containment.
Posted by: Kaiser | 03/22/2020 at 11:13 AM