Over at the McGraw-Hill blog, I wrote about how to consume Big Data (link), which is the core theme of my new book. In that piece, I highlight two recent instances in which bloggers demonstrated numbersense in vetting other people's data analyses. (Since the McGraw-Hill link is not working as I'm writing this, I placed a copy of the post here in case you need it.)
Below is a detailed dissection of Zoë Harcombe's work.
Eating red meat makes us die sooner! Zoë Harcombe didn’t think so.
In March, 2013, nutritional epidemiologists from Harvard University circulated new research linking red meat consumption with increased risk of death. All major mass media outlets ran the story, with headlines such as “Risks: More Red Meat, More Mortality.” (link) This high-class treatment is typical, given Harvard’s brand, the reputation of the research team, and the pending publication in a peer-reviewed journal. Readers are told that the finding came from large studies with hundreds of thousands of subjects, and that the researchers “controlled for” other potential causes of death
Zoë Harcombe, an author of books on obesity, was one of the readers who did not buy the story. She heard that noise in her head when she reviewed the Harvard study. In a blog post, titled “Red meat & Mortality & the Usual Bad Science,” (link) Harcombe outlined how she determined the research was junk science.
How did Harcombe do this?
Alarm bells rang in her head because she has seen similar studies in which researchers commit what I call “causation creep.”(link)
She then reviewed the two studies used by the Harvard researchers, looking especially for the precise definition of meat consumption, the key explanatory variable. She discovered that the data came from dietary questionnaires administered every four years (this meant subjects who didn’t answer this question would have been dropped from the analysis). All subjects were divided into five equal-sized groups (quintiles) based on the amount of red meat consumption. Surprisingly, “unprocessed red meat” included pork, hamburgers, beef wraps, lamb curry and so on. This part is checking off the box; it didn’t reveal anything too worrisome.
Harcombe suspected that the Harvard study does not prove causation but she needed more than just a hunch. She found plenty of ammunition in Table 1 of the paper. There, she learned that the cohort of people who report eating more red meat also report higher levels of unhealthy behaviors, including more smoking, more drinking, and less exercise. For example,
The researchers argue that their multivariate regression analysis “controlled for” these other known factors. But Harcombe understands that when effects are confounded, it is almost impossible to disentangle them. For instance, if you're comparing two school districts, and one is in a really rich neighborhood, and the other in a poor neighborhood, then race and income will be confounded and there is no way to know if the difference in educational outcomes is due to income or due to race.
Next, Harcombe looked for data to help her interpret the researchers’ central claim:
Unprocessed and processed red meat intakes were associated with an increased risk of total, CVD, and cancer mortality in men and women in the age-adjusted and fully adjusted models. When treating red meat intake as a continuous variable, the elevated risk of total mortality in the pooled analysis for a 1-serving-per-day increase was 12% for total red meat, 13% for unprocessed red meat, and 20% for processed red meat.
Her first inquiry was about the baseline mortality rate, which was 0.81%. Twenty percent of that is 0.16% so roughly speaking, if you decide to take an extra serving of processed red meat every day, you face a less-than-2-out-of-1000 chance of earlier death. (Whether the earlier death is due to the red meat or just more food consumed each day is another instance of confounding.)
This also raises the issue of error bars. As Gary Taubes explained in his response to the red-meat study (link), serious epidemiologists only pay attention to effects of 300% or higher, acknowledging the limitations of the types of data being analyzed. The 12- or 20-percent effect does not give much confidence.
The researchers are overly confident in the statistical models used to analyze the data, Harcombe soon learned. She was able to find the raw data, allowing her to compare them with the statistically adjusted data. Here is one of her calculations.
The five columns represent quintiles of red meat consumption from lowest (Q1) to highest (Q5). The last row (“Multivariate”) is the adjusted death rates with Q1 set to 1.00. The row labelled “Death Rate(Z)” is a simple calculation performed by Harcombe, without adjustment. The key insight is that the shape of Harcombe’s line is U-shaped while the shape of the multivariate line is monotonic increasing.
The purpose of this analysis is not to debunk the research. What Harcombe did here is delineating where the data end, and where the model assumptions take over. One of the themes in Numbersense is that every analysis combines data with theory. Knowing which is which is half the battle.
At the end of Harcombe’s piece, she checked the incentives of the researchers.
Harcombe did really impressive work here, and her blog post is highly instructive of how to analyze data analysis. Chapter 2 of Numbersense looks at the quality of data analyses of the obesity crisis.
Reminder: You can win a copy of my new book. See here for details.