Let me show you how bad the state of Covid-19 data is in the U.S.
Recall my post from two weeks ago about Covid-19 deaths in Texas starting to catch up with the earlier rise in reported cases. When cases started to spike in Texas, talk turned to the declining death rate, which is misleading since we should have been looking at cohort-adjusted death rates from the start. The media conspired to spread this misinformation, ignoring epidemiologists - when given a choice between reporting a misleading statistic today and reporting a more valid number in two weeks, the media cannot help but quench the immediate thirst.
In putting together the prior analysis, I already noticed the sad state of the data. On that day (July 27), Texas just changed how it reports deaths to rely on cause of death on death certificates. This created a single-day death toll of 675, well above the daily average, which presumably corrected for under-counting from previous days. With this correction, the state claimed that it will henceforth provide "more accurate" death counts.
What the state official did not explain is that the change in reporting also ensures a reporting delay for all deaths of at least one week. Let's think through this in the context of rising case rates and "declining" death rates. I make the nonsensical assumption that if patients die from Covid-19, they die immediately, on the day they are reported as new cases. Under the new reporting procedure, these cases will be reported today but the deaths will be reported when death certificates have been issued and can be inspected, which is at least 7-10 days in Texas. Thus, this group of patients (who already died) inflates the number of cases today while suppressing today's death rate since their deaths are not yet reported. This time lag is due to reporting procedures, and has a similar effect as the time lag between infection and death.
The data source, Covid Tracking Project, rated Texas "A" for "data quality" so presumably the state of the data elsewhere is worse.
***
I intended to update the analysis for this post last week but my effort was met with more data mischief. On August 2 (Sunday), Texas did not report any data, blaming a "systems upgrade" and promised that "data for Sunday will be posted with Monday’s data update." So I waited another week before publishing this update.
Just a reminder, two weeks ago, I presented the following chart:
The thin gray line is the growth curve for reported cases in Texas, smoothed using a seven-day window, from mid May through late July. The thick gray line is this growth curve shifted forward in time by three weeks. The red line is the growth curve of deaths in Texas, smoothed using a seven-day window. Except for the single-day anomaly (July 27), it appears that deaths roughly lagged cases by three weeks.
The next chart extends the time-line to August 1, the day before the data blackout:
This chart is disturbing. The growth of deaths in Texas has started to markedly outpace the growth of cases (time-shifted). Using the same time lag of 3 weeks, we should have expected deaths to climb 7 times between mid May to August 1. Instead, deaths from Covid-19 in Texas jumped over 9 times during this period.
This observation opens up a large can of worms. Back in July, the optimistic doctors and people were attributing the "declining" death rate not just to time delay but also to better treatments, younger people getting infected, virus becoming less lethal, etc. The data now suggest that treatments became less effective, or the virus got more lethal, etc. I'm not making such an argument, just pointing out why it is risky to draw premature conclusions from little bits of statistics.
***
Apparently, someone called time-out on these concerning numbers. On August 2, the state took a day off. When the data started flowing again, the death counts fell back to trend, as seen below:
For any data analyst, the alarm bells are deafening.
There are a number of suspicious issues here. Firstly, the official reason for the August 2 black-out was a system upgrade, and that the release for the next day would contain two days' worth of data. It doesn't appear that Monday's data contained two days' worth. The number of new cases released on August 3 was ~11,500. That is supposed to include cases for both August 2 and 3. Between July 27 and August 5, excluding Aug 2-3, daily cases were above 8,000.
Secondly, it should be simple to report deaths daily now that Texas is using death certificates to confirm deaths. I don't see how a system upgrade destroys the ability to know the dates of death. I wonder if they backtracked to the old way of reporting deaths after realizing the new procedure reveals a higher death toll.
***
Here's the problem facing data analysts in the Covid-19 era. The data are what the officials want them to be. They are constantly changing definitions, accounting rules, timing of releases, etc. It is impossible to make sense of the mess unless you maintain the database. Meanwhile, this data catastrophe is severely hampering public health decisions. The question is whether it is a feature or a bug.
I've seen similar issues in Arizona. Cases have been declining (as has testing) since early July, but in the last two weeks hospitalizations have absolutely taken off, far outpacing the weeks prior -- four or five weeks post positive-case spike.
https://imgur.com/mtwicji
Posted by: Dan Vargo | 08/10/2020 at 07:41 PM
DV: Convert each line to an index (say May 5 as 100). Then we can see the pattern more clearly. I'd not surprised at all that there are problems with Arizona as well.
Posted by: Kaiser | 08/10/2020 at 10:41 PM
Kaiser,
You're absolutely right that makes things clearer... Hospitalizations are TRAILING the increase in deaths from the case spike in early July.
https://imgur.com/zY5zygr
Posted by: Dan Vargo | 08/11/2020 at 01:19 PM