One statistical concept that instructors frequently don't have time to cover in Stat 101 is the "interaction" effect. I will explain this concept using the fantastic interactive graphic by the visualization team at the German publication Zeit (please also read the corresponding post on Junk Charts here for some background.)
When we ignore interactions, we end up with overly simplistic statistical summaries. For example, some study might find that drinking six cups of coffee reduces the chance of prostate cancer by 60 percent. But is this effect the same for all age groups? Is it possible that the risk reduction is higher for older men and lower for younger men, for example? When we are asking questions like these, we are asking whether the effect of coffee consumption on health interacts with age.
In the Zeit visualization, the following graph illustrates that Germans living in the former East Germany (blue line, referred to as East Germans hereafter) are more likely to hold negative views about smoking cannabis, relative to Germans living in West Germany (yellow; West Germans). On average, about 70 to 80 percent of the people deem the activity "bad" or "very bad" and the East-West gap was roughly 8%.
A relevant question here is whether the size of this East-West gap varies by age.
In the interactive visualization, one can click on different age groups to observe how the lines shift. The left chart below shows the East-West gap for 45-64 year olds while the right shows the gap for people aged 65 and above.
The strongest signal to hit us is probably that the 65-and-above cohort has a much more negative opinion of smoking cannabis than the younger cohort. This is a statement about the effect of age on the attitude towards cannabis, regardless of where they live. It's about a single factor, and so it isn't an interaction effect.
The next observation is that in the 65-and-above cohort, the East-West gap is noticeably smaller than the average gap, and the gap has remained essentially unchanged over the last 12 years. However, for the 45 to 64 age group (left chart), the gap has markedly increased, meaning that in 2012, East Germans were about 15 percent more likely to dislike smoking than West Germans even though both groups started out with about the same attitude in 2000.
Statisticians call this a significant interaction effect between East-West and age group. When an interaction effect exists, the aggregate statistic is not very useful as it fails to acknowledge the variability by age. While the average gap was 8%, the gap for people above 65 was only half of that while the gap for the 45 to 64 cohort was almost twice that. (I am ignoring the other age groups; just keep clicking.)
It is important to distinguish between three effects: the "main" effect of East Germany vs West Germany, the "main" effect of age group, and the interaction effect between East/West and age group.
When you ignore interaction effects, you are assuming "additivity". Unfortunately, when it comes to statistics, one plus one usually do not add up to two! This point causes much confusion for non-statisticians. In statistics, "one" is not an exact quantity; there is a margin of error around it.
As a second example, consider the following graph which shows the East-West gap on the issue of whether working mothers are good.
The two bolded lines represent the average person in East and West Germany. We see that across time the East-West gap has marginally narrowed from 28% to 25% over 25 years.
Does this gap vary depending on whether the respondent is male or female? To see this, we split the male and female responses. Below, the males are shown in gray on the left chart and the females are in gray on the right side.
First thing to notice is that the two lines for men (East and West) are both below average, meaning that men are less likely to accept working mothers. Not surprisingly, the lines for women (on the right chart) show they are more likely to accept working mothers. But these comments relate to the effect of gender on the working mothers issue. What we are interested in is whether the East-West difference is affected by gender.
This means, we care about the gaps between the gray lines on both charts. With only a little effort, you can see that the gap is wider on the left chart than on the right. This means, the gap between West and East German men is larger than that between West and East German women when it comes to working mothers. Statisticians call this an interaction effect. When such an effect is significant, statisticians prefer to talk about the genders separately, rather than combining them into one average.
Next time you run a regression, add some interactions and see if it makes a difference. I address this issue in Chapter 3 of Numbers Rule Your World.