In covering the Covid-19 crisis, the mainstream media continues to push the "record deaths" angle. I wrote a post last week about why new deaths is one of the least useful statistics in assessing an ongoing epidemic because it is a lagging indicator.
In that post, I then argued that one should look at the growth curve in infections in early epicenters that have implemented lockdowns to see if the containment measures have yielded success. That work is here.
Given that the media has failed to get the message, in this post, I return to the time lag between infections and deaths. This is well-understood intuitively - cases predict deaths; no one I think argues with that. More deaths today just tells us there were more cases some days ago.
Let's turn that qualitative statement into a quantitative analysis. This is where statistics come in handy.
Here are the trajectories in Lombardia of new cases and new deaths from February 25 to March 22, 2020.
This is the most basic chart someone can make. However, it offers little insight into the lagging relationship between cases and deaths.
The next chart, which is derived from the above data, shows there is a 7 day lag between cases and deaths. On average, it takes about 7 days for a newly reported case to result in a newly reported death (as a reminder, over 90 percent of the cases are expected to recover).
By superimposing the two lines, you can see the pattern even more clearly:
Here is how one gets from the second chart to the third chart. You can see the time lag in action:
If we take the line for new cases (gray/grey) and shift it forward by 7 days, we get a line that overlaps well with the line for deaths (red).
To re-iterate my point, if Lombardia reported record deaths, all this means is that the region had a record number of new cases from about a week ago. This is NOT news. If new deaths did not hit a record, that would have been newsworthy.
***
As I discussed yesterday, raw data are not very helpful. The first chart in this post shows raw counts and it does not reveal the time lag between infections and deaths.
Adjustments to the raw data help bring out the pattern. Here are two adjustments I used:
To see this relationship better, I converted both series of counts into indices, with February 25 as reference level (100). This just expresses each day's new cases or deaths relative to the first day.
While the indexing is sufficient to reveal the 7-day lag, I also smoothed the indices. This means each day's index is the (moving) average of the past four days' indices. Nature does not have a clock that says everyone who dies from Covid-19 must die exactly seven days after detection. The smoothing here reduces the variablity of the time lags across patients.
***
A couple of notes for the data nerds.
What I've been doing on this blog are small-scale exploratory data analyses. I leave it to full-time analysts to construct full-scale statistical models. Typically in a data science project, you begin with some data exploration, and if there are some insights, then you proceed to modeling.
Looking forward, I find that the observed pattern holds till deaths around March 24, or cases from March 17. Thereafter, the time lag still seems to hold but the growth index for deaths falls consistently below the growth index for infections. That might be good news, indicating a breakthrough in preventing deaths. But with only a few days of data, I'd be cautious in interpreting this pattern since so many factors are at play (e.g. new cases might be less severe).
P.S. [4/7/2020] Scott S. asked for the data behind these charts. You can download it here.
Comments