Over at the McGraw-Hill blog, I wrote about how to consume Big Data (link), which is the core theme of my new book. In that piece, I highlight two recent instances in which bloggers demonstrated numbersense in vetting other people's data analyses. (Since the McGraw-Hill link is not working as I'm writing this, I placed a copy of the post here in case you need it.)
Below is a detailed dissection of Zoë Harcombe's work.
***
Eating red meat makes us die sooner! Zoë Harcombe didn’t think so.
In
March, 2013, nutritional epidemiologists from Harvard University circulated new
research linking red meat consumption with increased risk of death. All major
mass media outlets ran the story, with headlines such as “Risks: More Red Meat,
More Mortality.” (link) This
high-class treatment is typical, given Harvard’s brand, the reputation of the
research team, and the pending publication in a peer-reviewed journal. Readers
are told that the finding came from large studies with hundreds of thousands of
subjects, and that the researchers “controlled for” other potential causes of
death
Zoë Harcombe, an author of books on obesity, was one
of the readers who did not buy the story. She heard that noise in her head when
she reviewed the Harvard study. In a blog post, titled “Red meat &
Mortality & the Usual Bad Science,” (link)
Harcombe outlined how she determined the research was junk science.
How
did Harcombe do this?
Alarm
bells rang in her head because she has seen similar studies in which
researchers commit what I call “causation creep.”(link)
She then
reviewed the two studies used by the Harvard researchers, looking especially
for the precise definition of meat consumption, the key explanatory variable. She
discovered that the data came from dietary questionnaires administered every
four years (this meant subjects who didn’t answer this question would have been
dropped from the analysis). All subjects were divided into five equal-sized
groups (quintiles) based on the amount of red meat consumption. Surprisingly,
“unprocessed red meat” included pork, hamburgers, beef wraps, lamb curry and so on. This part is
checking off the box; it didn’t reveal anything too worrisome.
Harcombe
suspected that the Harvard study does not prove causation but she needed more
than just a hunch. She found plenty of ammunition in Table 1 of the paper. There,
she learned that the cohort of people who report eating more red meat also
report higher levels of unhealthy behaviors, including more smoking, more
drinking, and less exercise. For example,
The
researchers argue that their multivariate regression analysis “controlled for”
these other known factors. But Harcombe understands that when effects are
confounded, it is almost impossible to disentangle them. For instance, if you're comparing two school districts, and one is in a really rich neighborhood, and the other in a poor neighborhood, then race and income will be confounded and there is no way to know if the difference in educational outcomes is due to income or due to race.
Next,
Harcombe looked for data to help her interpret the researchers’ central claim:
Unprocessed and processed red meat intakes
were associated with an increased risk of total, CVD, and cancer mortality in
men and women in the age-adjusted and fully adjusted models. When treating red
meat intake as a continuous variable, the elevated risk of total mortality in
the pooled analysis for a 1-serving-per-day increase was 12% for total red
meat, 13% for unprocessed red meat, and 20% for processed red meat.
Her
first inquiry was about the baseline mortality rate, which was 0.81%. Twenty
percent of that is 0.16% so roughly speaking, if you decide to take an extra
serving of processed red meat every day, you face a less-than-2-out-of-1000
chance of earlier death. (Whether the earlier death is due to the red meat or
just more food consumed each day is another instance of confounding.)
This
also raises the issue of error bars. As Gary Taubes explained in his response
to the red-meat study (link), serious epidemiologists only pay attention to
effects of 300% or higher, acknowledging the limitations of the types of data
being analyzed. The 12- or 20-percent effect does not give much confidence.
The
researchers are overly confident in the statistical models used to analyze the
data, Harcombe soon learned. She was able to find the raw data, allowing her to
compare them with the statistically adjusted data. Here is one of her
calculations.
The
five columns represent quintiles of red meat consumption from lowest (Q1) to
highest (Q5). The last row (“Multivariate”) is the adjusted death rates with Q1
set to 1.00. The row labelled “Death Rate(Z)” is a simple calculation performed
by Harcombe, without adjustment. The key insight is that the shape of
Harcombe’s line is U-shaped while the shape of the multivariate line is
monotonic increasing.
The
purpose of this analysis is not to
debunk the research. What Harcombe did here is delineating where the data end,
and where the model assumptions take over. One of the themes in Numbersense is that every analysis combines
data with theory. Knowing which is which is half the battle.
At
the end of Harcombe’s piece, she checked the incentives of the researchers.
Harcombe
did really impressive work here, and her blog post is highly instructive of how
to analyze data analysis. Chapter 2 of Numbersense
looks at the quality of data analyses of the obesity crisis.
***
Reminder: You can win a copy of my new book. See here for details.
Recent Comments