My co-columnist Andrew Gelman has been doing some fantastic work, digging behind that trendy news story that claims that middled-aged, non-Hispanic, white male Americans are dying at an abnormal rate. See, for example, this New York Times article that not only reports the statistical pattern but also in its headline, asserts that those additional deaths were due to suicide and substance abuse.
It all began with the chart shown on the right. It appears that something dramatic happened in the late 1990s when the USW (red) line started to diverge from those of all the other countries. The USW line started to creep upwards, meaning that the death rate is increasing for US white non-Hispanic males aged 45-54. (The bolded blue line is for US white Hispanic males aged 45-54 and does not look different from those of other countries.)
Prompted by a lively discussion in the comments section, Andrew pursued a deeper analysis of this data. This has led to a series of posts in which he refined the analysis (see here, here, here and here.) I recommend reading the entire series, as it paints a full picture of how statistical thinking works. In the rest of this post, I will present a cleansed summary of his argument while leaving out details.
We first note that the veracity of the data is not at issue. We accept as a starting point the trends shown above to be true; this can easily be verified using public data. The debate is around why.
People who analyze age-group data are particularly sensitive to bias due to discretization. The original analysis, co-authored by Angus Deaton the recent winner of the economics Nobel, focuses on the age group 45 to 54. If you compute the average age in this age group over time, you may be shocked this is not flat; the average age of people aged 45 to 54 has been increasing over time. As the following chart shows, since 1990 or so, the average age in this age group moved up by about half a year. (Data from CDC Wonder.)
Because older people die at a higher rate, the death rate within age group 45 to 54 will increase just because of the increasing average age of this age group--without having to resort to other reasons such as suicides.
It is noted also that the Baby Boom in the U.S. caused large fluctuations in the age distributions over time. This observation provides nice color on why the average age is increasing but is not required. Aging population is another cause.
What is crucial to the reasoning is the steepness of the increase in death rate with increasing age. Surprisingly, it is not easy to find a chart plotting death rates by age. Wikipedia has this graph shown on the right. This is not empirical data but the Gompertz-Makeham law (link), which is described as accurate for the 30-80-year-old range. The key insight is that mortality rate increases exponentially after age 30.
Having a theory is not enough. In his first post, Andrew tested this theory by pulling a few numbers and working out a back-of-the-envelope calculation. The goal is estimate the magnitude of this average-age effect. How much of the observed anomalous trend does this explain? Do we need any other reasons?
Andrew estimated that the average age in the 45-54 age group moved up 0.6 year between 1989 and 2013, the period covered by the original study. From life tables, he found that mortality worsens by about 8 percent per extra year lived. Thus, over the research period, increasing average age contributes 0.6*8 = 4.8 percent per year to the death rate.
This level of increase explains most of the trend shown by that red line in the original chart. Thus, Andrew concludes that the data, after adjusting for age, showed that mortality rate among middled-aged, non-Hispanic, white male Americans has been essentially flat.
The original findings that this group behaves differently from those in other countries, and from the US hispanic male population are still interesting.
A number of techniques can be used to control for the shift in the underlying age distribution. Disaggregation of the data is one method. CDC releases data at the single age level, and analyzing the data year by year is the next step that Andrew undertook.
One result of this finer analysis is that in the years 1999-2013 (i.e. after dropping the first 10 years of the first chart), even after adjusting for age, there is still about a 4 percent increase in mortality rate among the U.S. middle-aged white non-Hispanics, roughly half of the trend shown in the original chart. In other words, in the shortened time frame, age adjustment explains half of the trend, not all of it.
This has led one of the original authors, Deaton, who just won the Nobel in economics, to say "the overall increase in mortality is not due to failure to age adjust."
This statement is a bit too loose for my liking. First, "is not due to" implies that age aggregation has zero effect when it does explain half of the trend. Second, one should always age-adjust if the underlying age distribution is changing. Even if the age adjustment did not explain anything at all, I'd argue one should still age-adjust. Doing so would help eliminate age aggregation as a potential reason for the observed trend.
One argument against age adjustment is that it involves a lot of work - finding the right data, processing the data, merging the data, etc. But unless one does this work, one can't know how strong the aggregation effect is. And if you have done the homework, why not show it?
Disaggregating all the data is annoying because now you have one chart per single age. The next method for age adjustment is "standardization". This requires creating a reference age distribution, which is then applied to all years. In effect, we are artificially holding the age distribution constant so age could no longer explain any effect.
This is what Andrew's age standardized rates look like:
For the age-adjusted line (in black), what he did is to "weight each year of age equally". This shows that the effect of increasing ages within this age group is growing over time.
Then, something really interesting happens when Andrew split the black line by gender:
So it turns out that middle-aged U.S. white non-Hispanic men are not where the story is. The age-adjusted mortality rate for the corresponding women has steadily climbed between 1999 and 2013!
Next, Andrew looked at the other age groups and found that an even more pronounced trend affecting U.S. non-Hispanic whites in the 35-44 age group.
He also looked at Hispanic whites, and African Americans, which I don't repeat here. Even after age adjustment, those groups show trends that are more in line with the rest of the world.
Finally, for those wondering how this is relevant to say the business world, let me connect the dots for you.
Imagine that you run a startup that sells an annual subscription. One of your key metrics is the churn rate, defined as the number of subscribers who quit during period t divided by the number of paying subscribers at the start of period t. So a monthly churn rate of 5% means that five percent of the paying subscribers quit the service during that month.
There are two reasons to age-adjust this churn rate. First, the shape of the churn rate curve is not smooth. In particular, almost no one churns during the first 12 months. Second, the startup is growing very rapidly. This means that a lot of new customers are being acquired, and each new customer has up to 12 months in which the churn rate is close to zero.
What happens is the churn rate will fluctuate based on the monthly growth rates of this subscription service. As the growth rate fluctuates, the average tenure of the user base fluctuates. The more new customers in their first year, the lower the churn rate.
If the churn rate is not age-adjusted, you don't know if customers are increasingly more dissatisfied with your service, or if you just have slower growth which leads to increasing average tenure!