Over at the McGraw-Hill blog, I wrote about how to consume Big Data (link), which is the core theme of my new book. In that piece, I highlight two recent instances in which bloggers demonstrated numbersense in vetting other people's data analyses. (Since the McGraw-Hill link is not working as I'm writing this, I placed a copy of the post here in case you need it.)
Below is a detailed dissection of Zoë Harcombe's work.
***
Eating red meat makes us die sooner! Zoë Harcombe didn’t think so.
In March, 2013, nutritional epidemiologists from Harvard University circulated new research linking red meat consumption with increased risk of death. All major mass media outlets ran the story, with headlines such as “Risks: More Red Meat, More Mortality.” (link) This high-class treatment is typical, given Harvard’s brand, the reputation of the research team, and the pending publication in a peer-reviewed journal. Readers are told that the finding came from large studies with hundreds of thousands of subjects, and that the researchers “controlled for” other potential causes of death
Zoë Harcombe, an author of books on obesity, was one of the readers who did not buy the story. She heard that noise in her head when she reviewed the Harvard study. In a blog post, titled “Red meat & Mortality & the Usual Bad Science,” (link) Harcombe outlined how she determined the research was junk science.
How did Harcombe do this?
Alarm bells rang in her head because she has seen similar studies in which researchers commit what I call “causation creep.”(link)
She then reviewed the two studies used by the Harvard researchers, looking especially for the precise definition of meat consumption, the key explanatory variable. She discovered that the data came from dietary questionnaires administered every four years (this meant subjects who didn’t answer this question would have been dropped from the analysis). All subjects were divided into five equal-sized groups (quintiles) based on the amount of red meat consumption. Surprisingly, “unprocessed red meat” included pork, hamburgers, beef wraps, lamb curry and so on. This part is checking off the box; it didn’t reveal anything too worrisome.
Harcombe suspected that the Harvard study does not prove causation but she needed more than just a hunch. She found plenty of ammunition in Table 1 of the paper. There, she learned that the cohort of people who report eating more red meat also report higher levels of unhealthy behaviors, including more smoking, more drinking, and less exercise. For example,
The researchers argue that their multivariate regression analysis “controlled for” these other known factors. But Harcombe understands that when effects are confounded, it is almost impossible to disentangle them. For instance, if you're comparing two school districts, and one is in a really rich neighborhood, and the other in a poor neighborhood, then race and income will be confounded and there is no way to know if the difference in educational outcomes is due to income or due to race.
Next, Harcombe looked for data to help her interpret the researchers’ central claim:
Unprocessed and processed red meat intakes were associated with an increased risk of total, CVD, and cancer mortality in men and women in the age-adjusted and fully adjusted models. When treating red meat intake as a continuous variable, the elevated risk of total mortality in the pooled analysis for a 1-serving-per-day increase was 12% for total red meat, 13% for unprocessed red meat, and 20% for processed red meat.
Her first inquiry was about the baseline mortality rate, which was 0.81%. Twenty percent of that is 0.16% so roughly speaking, if you decide to take an extra serving of processed red meat every day, you face a less-than-2-out-of-1000 chance of earlier death. (Whether the earlier death is due to the red meat or just more food consumed each day is another instance of confounding.)
This also raises the issue of error bars. As Gary Taubes explained in his response to the red-meat study (link), serious epidemiologists only pay attention to effects of 300% or higher, acknowledging the limitations of the types of data being analyzed. The 12- or 20-percent effect does not give much confidence.
The researchers are overly confident in the statistical models used to analyze the data, Harcombe soon learned. She was able to find the raw data, allowing her to compare them with the statistically adjusted data. Here is one of her calculations.
The five columns represent quintiles of red meat consumption from lowest (Q1) to highest (Q5). The last row (“Multivariate”) is the adjusted death rates with Q1 set to 1.00. The row labelled “Death Rate(Z)” is a simple calculation performed by Harcombe, without adjustment. The key insight is that the shape of Harcombe’s line is U-shaped while the shape of the multivariate line is monotonic increasing.
The purpose of this analysis is not to debunk the research. What Harcombe did here is delineating where the data end, and where the model assumptions take over. One of the themes in Numbersense is that every analysis combines data with theory. Knowing which is which is half the battle.
At the end of Harcombe’s piece, she checked the incentives of the researchers.
Harcombe did really impressive work here, and her blog post is highly instructive of how to analyze data analysis. Chapter 2 of Numbersense looks at the quality of data analyses of the obesity crisis.
***
Reminder: You can win a copy of my new book. See here for details.
While a hamburger is processed, it is still red meat. The processed meats all have something done to them involving cooking and usually adding chemicals. That is why they are treated differently.
Harcombe doesn't seem to like the idea of multivariable adjustment, but it really does work. One of the problems in this data is that age varies between the meat consumption quintile groups. Once you correct for age there is a trend for risk with total meat consumption. Correcting for the others has some problems but generally seems safer. I'm not going to think about what all the path implications of all the covariates are.
Basically with these types of analyses, it really is a question of what other covariates are associated with the factor of interest and are we successfully correcting for them and are there some that we missed. Harcombe seems to be against doing these corrections. It is also possible to overcorrect by including covariates on the causal pathway. That is if consuming meat causes the effect then we shouldn't correct for it. This probably applies partly to BMI and total energy, so it may be overcorrecting. There are probably other things that are missed so that the model is under corrected.
One interesting result is that low meat eaters have higher cholesterol. A suspicion is that people with high cholesterol are modifying their diet, and it is having little effect on their cholesterol. It is an interesting possibility as it would reduce the apparent effect of eating meat. Harcombe seems to think that cholesterol is good. Try telling that to 40 year olds who have very high cholesterol and a heart attack.
Posted by: Ken | 07/16/2013 at 06:52 AM
Ken: The reason why she complained about processed and unprocessed is that the research separately estimated the impact of the two types of red meat. I skipped that detail as it's not really pertinent to my point here.
I don't endorse Harcombe's point of view on multivariate adjustments and she's painting with a broad brush. What I do like is the way she interacts with the data analysis. As a consumer, we cannot afford the time nor have the expertise to replicate everything that researchers have done but we also must not give blind trust.
As you pointed out, multivariate adjustment is based on a model and it is only as good as the model assumptions. It's a challenge when the amount of confounding is so much.
Posted by: Kaiser | 07/16/2013 at 12:09 PM
Multivariate adjustments are, indeed, very useful but can be tricky. Note Ken's comments about undercorrection / overcorrection.
Looking at the less adjusted data [Death rate (Z)] I see Q1-Q4 with nothing going on, and Q5 showing a big effect. (a threshold model) It's easy to be suspicious, at least, of an adjustment that converts this to a monotonic, nearly linear effect.
Posted by: zbicyclist | 07/20/2013 at 11:02 PM
You have to use common sense as well -when you look at past large healthy populations like Japan they mostly ate rice and vegetables. Then when newer generations move to the West or as fast food chains creep in over there they start to get heart disease, cancer and other western diseases. So whether it's a bag of chips, bacon or a big mac you're best to avoid any of them I would imagine.
Low meat eaters don't change their cholesterol because they probably still eat dairy or other types of meat like fish which also have cholesterol. Saturated fats from coconut oil can also make the liver produce more cholesterol. Dean Ornish and others have shown you can reverse heart disease with a plant-based diet.
Criticizing the person for making money or being a vegan is called an ad hominem attack and doesn't usually work in rational arguments.
Posted by: Will | 07/30/2013 at 07:54 AM