This post is intended to provoke. It also contains further thoughts on the previous note I posted about the same topic.
(For less technical readers, I hope you persist and read the post. I try to make this stuff comprehensible for all.)
As the pandemic rages on, commentators talk incessantly about "exponential" growth. This growth model might be true in a theoretical sense but it is not reflected in the reported data. (I acknowledge up front that when the data do not fit a model, it does not follow that the model is incorrect for sure. That's a different post.) In this post, I make a couple of inconvenient observations around the exponential assumption.
It is easy to see that the real-world data do not show exponential growth in most countries. Here is the chart in log scale made famous by the Financial Times graphics team led by John Burn-Murdoch.
The interpretation of this chart as exponential growth is explained also here, at the Our World in Data site. They say this: "If, during an outbreak, the time it takes for deaths to double remains constant, then the disease is spreading exponentially."
***
Since we are talking about a model, let's make a model. Here are three exponential growth curves with three rates (A = 0.10, B = 0.15, C= 0.20).
When plotted in log scale, we see three straight lines. The slope of these lines are the respective rates.
This is one of the reasons scientists like to plot exponentially growing data on a log scale. The slope of each line gives the rate of each exponential.
The rates are often expressed as "doubling times" in epidemiology. The following chart shows for each rate, the time needed to reach a given number of cases. The cases double from tick to tick. For example, for model A, it takes 3 days to double, and 6 days to quadruple. Getting straight lines means that the number of days to double remains constant regardless of starting point.
The following chart shows incremental doubling times. It takes 3 days to double to 4 cases, then 3 more days to double to 8 cases, and so on. It is a flat line for an exponential model.
This is the point made in the quote above.
***
Next, let's look at some real data. I'm using the Lombardia series since I have it handy from previous work with it. As I already showed before, in log scale, the growth in cases is clearly not exponential. It is not a straight line.
Many commentators adopt an interpretation that I will call "pointwise exponential". The curve above is interpreted as an exponential with changing rates (or, doubling times). In other words, at each point in time, it is an exponential with a specific rate. At a different point in time, it is a different exponential with a different rate. The rate keeps decreasing over time, which means the doubling time keeps increasing.
I am now going to show that this interpretation is misleading at all points in time. For illustration, I focus on 14 March, the date when the cases in Lombardia first exceeded 10,000. Computing the rate on 14 March is the same as computing the slope of the straight line that meets the curve on 14 March (for those who know, it's the tangent to the curve on 14 March). The slope for Lombardia on 14 March is 0.0755.
The slope gives me the rate, which gives me the doubling time. The rate of 0.0755 converts to a doubling time of 4 days. This means that I should expect the number of cases to have doubled from 14 March to 18 March. But the actual counts moved from 11,685 to 17,713, which is a 52% increase, not 100%.
The following diagram explains what went wrong. After I fixed the rate on 14 March, I'm projecting a straight line going forward (on the assumption that the growth is exponential). Since the actual cases dip below the straight line (shown as the dashed line), I over-estimated the number of cases on 18 March.
If the underlying data really followed an exponential model, then the actual case counts would skip along the dashed line, and this issue goes away. Since the underlying data are not really growing exponentially, the line bends away from the straight, and the further out I go, the bigger the gap.
The pointwise exponential interpretation basically fits a new line at each point in time. This means the slope is constantly changing. That makes the "doubling time" meaningless because by the time you get to the other side, the slope has changed.
***
Let me further provoke you with this next example. The following is another growth curve shown in log scale.
Instead of going to the extreme of a point-wise exponential interpretation, I use a piece-wise exponential interpretation. I treat each 7-day (weekly) period as an exponential, so that each week is fitted to a straight line, and the rate/slope/doubling time is allowed to change once a week.
Under this interpretation, this hypothetical region experienced exponential growth with weekly decreasing rates. It started at 0.11, then went to 0.04, and finally reached 0.02. Correspondingly, the doubling time started at 3 days, then went to 8, and finally reached 13.
This sounds reasonable (ignoring the problem I already pointed out in the first section).
Here's the problem. I played a trick here. The curve I showed you in log scale is actually a linear function, a perfect straight line. Here it is in linear scale:
Let me bring back the data plotted in log scale:
Should this growth curve be modeled as a straight line with a constant rate of growth or a piecewise exponential growth curve with decreasing rates?
Is there a reason we can not use something like AICc or BIC to compare the models on goodness-of-fit to pick the most appropriate model?
Posted by: Josh | 04/16/2020 at 09:04 AM