Statistical significance explainer, and Instagram's experiment to hide Likes
Who is a millennial? An example of handling uncertainty

The rule governing which variable to put on which axis, served a la mode

When making a scatter plot, the two variables should not be placed arbitrarily. There is a rule governing this: the outcome variable should be shown on the vertical axis (also called y-axis), and the explanatory variable on the horizontal (or x-) axis.

This chart from the archives of the Economist has this reversed:


The title of the accompanying article is "Ice Cream and IQ"...

In a Trifecta Checkup (link), it's a Type DV chart. It's preposterous to claim eating ice cream makes one smarter without more careful studies. The chart also carries the xyopia fallacy: by showing just two variables, readers are unwittingly led to explain differences in "IQ" using differences in per-capita ice-cream consumption when lots of other stronger variables will explain any gaps in IQ.

In this post, I put aside my objections to the analysis, and focus on the issue of assigning variables to axes. Notice that this chart reverses the convention: the outcome variable (IQ) is shown on the horizontal, and the explanatory variable (ice cream) is shown on the vertical.

Here is a reconstruction of the above chart, showing only the dots that were labeled with country names. I fitted a straight regression line instead of a curve. (I don't understand why the red line in the original chart bends upwards when the data for Japan, South Korea, Singapore and Hong Kong should be dragging it down.)


Note that the interpretation of the regression line raises eyebrows because the presumed causality is reversed. For each 50 points increase in PISA score (IQ), this line says to expect ice cream consumption to raise by about 1-2 liters per person per year. So higher IQ makes people eat more ice cream.


If the convention is respected, then the following scatter plot results:


The first thing to note is that the regression analysis is different here from that shown in the previous chart. The blue regression line is not equivalent to the black regression line from the previous chart. You cannot reverse the roles of the x and y variables in a regression analysis, and so neither should you reverse the roles of the x and y variables in a scatter plot.

The blue regression line can be interpreted as having two sections, roughly, for countries consuming more than or less than 6 liters of ice cream per person per year. In the less-ice-cream countries, the correlation between ice cream and IQ is stronger (I don't endorse the causal interpretation of this statement).


When you make a scatter plot, you have two variables for which you want to analyze their correlation. In most cases, you are exploring a cause-effect relationship.

Higher income households cares more on politics.
Less educated citizens are more likely to not register to vote.
Companies with more diverse workforce has better business performance.

Frequently, the reverse correlation does not admit a causal interpretation:

Caring more about politics does not make one richer.
Not registering to vote does not make one less educated.
Making more profits does not lead to more diversity in hiring.

In each of these examples, it's clear that one variable is the outcome, the other variable is the explanatory factor. Always put the outcome in the vertical axis, and the explanation in the horizontal axis.

The justification is scientific. If you are going to add a regression line (what Excel calls a "trendline"), you must follow this convention, otherwise, your regression analysis will yield the wrong result, with an absurd interpretation!


[PS. 11/3/2019: The comments below contain different theories that link the two variables, including theories that treat PISA score ("IQ") as the explanatory variable and ice cream consumption as the outcome. Also, I elaborated that the rule does not dictate which variable is the outcome - the designer effectively signals to the reader which variable is regarded as the outcome by placing it in the vertical axis.]


Feed You can follow this conversation by subscribing to the comment feed for this post.


I don’t understand why you say that the amount of ice-cream must be on the x-axis. Why do you think of that as the explanatory variable? Maybe the chart maker set out to show that higher Pisa scores cause eating more ice-cream.

Both statements are preposterous, of course, the variables are correlated only because of a third, explanatory variable: money.

By the way, Pisa scores do not stand in for IQ, don’t confuse the two.


Cris: Think about it this way. You can choose to explore whether higher "IQ" causes people to eat more ice-cream. If that is your agenda, then ice cream consumption takes the status of outcome variable, and it should be on the vertical (y) axis. As for the problems with the analysis, there are many but that would be a different post.


Any correlation either way is spurious but if pressed I’d be with the Economist that PISA is the independent variable. If we assume that academic attainment is a proxy for GDP per capita (quite an assumption) then wealthier countries can spend more on luxuries like ice cream.


John: Here is a quote from the article: "Though it may seem like an odd suggestion on a brisk early-April morning, year-round subsidised ice-cream for children could improve educational attainment." So the intention was the other direction. I agree that your explanation is more likely to be true.
btw, I notice that the Economist piece came out on 1 Apr 2016.


This is one of those difficult situations where neither answer is correct. You have the education level which leads to higher wealth, but of course higher wealth may lead to a better education system. For ice cream consumption, ignoring temperature, it is reasonable to expect that greater wealth will increase consumption. Epidemiologists (and probably economists) would construct a path diagram to explain this. In the end causality is very difficult.

Antonio Rinaldi

This is one of those cases where the discussion in the post is as good as the discussion in the comments. It's a sin that typepad doesn't allow to subscribe to comments by email.


AR: I did the next best thing and referred to these comments in the post. Thanks for the hint.

The comments to this entry are closed.