If you're reading this blog, you probably have heard that correlation does not imply causation. Apparently, health-beat reporters and medical researchers in peer-reviewed journals know this too. They often explicitly warn us that the studies only show correlation, and they couldn't explain why the results are as they are. All is well if they stop there. But they don't.
All too often, they succumb to "causation creep". All of a sudden, they interpret the results in a way that presumes causation.
Here are two particularly egregrious examples that appeared on the Vitals blog at MSNBC.
In the first example (link), we are told:
Researchers followed 397 children from pregnancy (sic) through their first year of life, and found that those living with dogs developed 31 percent fewer respiratory tract symptoms or infections, 44 percent fewer ear infections and received 29 percent fewer antibiotic prescriptions.
And then comes the disclaimer:
the researchers acknowledged that couldn't account for all such factors [other than living with dogs that can explain the finding], and noted that they found a correlation, not a cause-and-effect relationship.
Finally, causation creep happens:
For heathlier kids, get a cat or dog, study suggests.
This is a cause-effect statement, there is no getting around it. It's saying that if you get a cat or a dog, your kids will be healthier. If the researchers truly believe that they found a correlation, then drawing this causal conclusion is unconscionable. Chances are, they just thought this is a fun thing to say (or at least the headline writer thinks so), allowing causation to creep in.
PS. The first person who commented on the article suggests that perhaps families that own pets also spend more time outdoors (playing with pets) and it could be the outdoor exposure that causes the observed effect. We don't know why but one thing is for sure: families who keep pets are not at all the same as families who don't keep pets.
PPS. It may be cruel at this point to point out that the so-called "cat" effect is 2 percent fewer antibiotics, which amounts to nothing. Also, no urban families were present in this analysis.
***
The second example (link) is if anything worse.
We are told:
Katzmarzyk and colleagues analyzed information from five earlier studies involving more than 167,000 adults that looked at the link between sitting and risk of dying from any cause over the next four to 14 years... About 27 percent of deaths in the studies could be attributed to sitting, and 19 percent to television viewing, the researchers said.
The disclaimer:
The researchers noted their study assumed a cause-effect link between sedentary behavior and risk of dying, which further research should validate, they said.
Finally, a blossom of colorful statements all of which presume causation, which they assumed without proof:
Sit less than 3 hours a day, add 2 years to your life (sic)
Reducing the daily average time that people spend sitting to less than three hours would increase the U.S. life expectancy by two years (sic)
The study adds to a growing body of evidence suggesting that sitting itself is deadly. (sic)
***
These researchers should realize they are not doing Freakonomics. It's true Levitt and Dubner also sometimes succumb to causation creep, especially in the more casual pieces. But in the original book, where they described how abortion policy could have caused crime to fall, they went through a vigorous explanation to rule out many other possible explanations. None of the studies cited here (or many other similar ones) have the care that is required to claim causation. We should note that the abortion finding has been debunked - that just goes to show how hard it is to prove causation with data that is conveniently collected.
In the pet study, I was not impressed with the "significance" of the results statistically to begin with but more so when I realized their unit of analysis was not families but "family-weeks". That is they analyzed the impact of current pet presence with current family health each week. That seems all kinds of mixed up and any large N effects are overstated (aren't the effects on immunity cumulative? aren't immunity effects lagged? aren't levels of sickness and health week to week for a single highly related to each other? doesn't excluding urban families bias results?).
I agree that pet owners have many potential confounding factors but surely excluding urban families is even worse. They really seemed to be mixed up about the relationship between pets, pets being outdoors, people being with pets, people being with people, and people being outdoors.
Posted by: Floormaster Squeeze | 07/11/2012 at 10:04 AM
It's important to note that none of the scientists involved in either of these studies actually said ANY of those cause-effect statements. Those come from Susan E. Matthews and "MyHealthNewsDailyStaff", respectively. The researchers KNOW they aren't doing Freakonomics. Somebody needs to tell the idiots who skim these papers and write articles about them. This has been a constant problem ever since science was first in the news. So, blame where blame is due, I guess.
Posted by: Budding Scientist | 07/12/2012 at 02:46 PM
Budding Scientist: One of the study researchers was cited in the article speculating on all kinds of reasons why animals could have caused disease so they definitely was providing cover for the reporter. This is only one example I picked up on the day I wanted to write about causation creep; I come across examples like this every week. In Freakonomics, when they told us some economist wanted to change her surname to start with A or that parents should consider timing the births of their kids in order to turn them into star athletes, they fell into the same trap. It's very easy to make careless statements like these.
Posted by: Kaiser | 07/18/2012 at 12:35 AM
I'm not a statistician but a quantitative social scientist; so, I'm familiar with the issue you raise in the post. I have two comments. One is that just as correlation does not imply causation it also doesn't imply lack of causation. A correlation may or may not also be a causal relationship. So as I see it, the problem isn't so much suggesting that a correlation might indicate that causality is involved. It is suggesting this without appropriate qualification and/or suggesting it without making a convincing case against plausible alternative explanations. At the end of your post you refer to how hard it is to prove causation with data that is "conveniently collected." But in science we never prove causation if "prove" is being used the way that term is used in mathematics and philosophy. Even if one has conducted a randomized trial and found a causal effect, the case for such an effect hasn't been proven. Just because random assignment is the "gold standard" way of increasing the likelihood of balance in the treatment and control groups that allow us to conclude that causality is involved, this doesn't mean that it guarantees such balance. And a finding of statistical significance is not a proof. Of course, I'm not saying anything that you don't already know. I was just worried that your comment about proving causality might reinforce another misconception about science and statistics that I often see in discussions in the media---that scientific studies are about proving things.
Posted by: Michael L. | 07/24/2012 at 08:05 AM
Do you have a link to where was the abortion hypothesis debunked?
Posted by: Tom | 08/08/2012 at 09:29 AM
Tom: You can start with the generally neutral list of references from Wikipedia (link).
Michael: Thanks for the note, which I totally agree with. What I'm against is the sloppy language used to make the limited conclusions sound exciting. If the researcher admits that he/she doesn't know why A is correlated with B, and there is nothing in the study to back A as a cause of B, then don't suggest that people engage in behavior A in the hope of attaining outcome B.
Posted by: Kaiser | 08/09/2012 at 11:52 PM