« Another take on 9-9-9 | Main | Showing off the world in charts »



So, in a year in which union membership declined a lot, one should expect to see middle-class incomes also drop substantially.

I wouldn't expect to see that at all. The connection wouldn't be nearly as intimate as a year.


Derek: that's why I said it is one test. You're welcome to test for multi-year effects and let us know what you find. One has to be cautious about doing too many tests (due to multiple comparisons) plus one should have a plausible theory for why the effect takes 3 years not 4 or 6.
Further, if we establish that there is a 3-year lag in (auto)correlation, what does that tell us? To me, that is a weak case to argue for causality. The longer one has to wait to see correlation, the more time there is for all kinds of other factors to influence the target variable.


Does the lagged test actually demonstrate causation? I can see how it would demonstrate the causal direction (if there was one). But I don't see that this test rules out the existence of a third, hidden factor that causes union membership to decline first, then incomes at a lag? Say, for instance, decline of the manufacturing sector, or decline in manufacturing employment.


Gary: No it can't prove causation. The idea is you want testable hypotheses to come out of the *assumption* of causality. One test or even a bunch of tests cannot be conclusive but if several such hypotheses fit the data, then we have stronger belief in the assumed causality.
In much the same way, if the first scatter plot above were to show little correlation, most people would conclude there cannot be causality but if one nitpicks, one can argue it only reduces the chance of causality.
The point I'm making is that the people behind the report have no right to make causal statements based on the chart of two lines.
Your standard of "ruling out the existence of causation" is much too stringent for any kind of statistical analysis.

Cody L. Custis

I believe, if one were to play with the scales, that one could show the drop in percentage of children born to married mothers is causing the decline in middle class income, using the same set of fancy statistics and complete lack of science to justify causality.


Andrew Gelman


Just a small comment: I think the scatterplot would be improved if you connect the dots sequentially with light lines and then label the start and end points. This doesn't always work but given the steady trend, I think it will work well in this example.


I have to disagree with you entirely here on the use of the annual change. Income is a complicated thing and no one would suggest that it's entirely dependent on one factor, nor would you expect for drops in union membership to result in income changes within a year every year.

Since there is an obvious relationship between unions and salary, and that relationship ought to be in exactly the direction shown in the data, this looks like a pretty good argument for declining union membership playing a role in declining middle-class income.


I get your point about two things potentially being more correlated with time rather than each other. But I would argue that your analysis also has a major flaw. Assuming that the change in union membership should have a change in income in the same year. What is that assumption based on? Your gut? We'll that's no better than the other graph. In fact it very likely could be worse.


John and Ryan: It appears that you haven't read the comments above and my response to them. The point of the blog is to show how serious analysts should seriously interrogate their data and avoid making causal statements based on a simple x-y correlation plot. This post is not a scholarly article proving or disproving the relationship. If you think the change in union membership takes X years to have an effect, you can replicate my analysis looking at the data with X years of lag. I encourage you to do that work and post your results here.

Generally, if you look at lags of 1, 2, 3, 4, 5,...,infinity years, it is not unusual to be able find one lag that will "prove" a preconceived notion of correlation. Thus, a solid theory is needed to support any such analysis. Therefore, I also encourage you to post your theory as to how many years it takes for the effect to take hold, and why that many years are needed, and not more or less.


What an excellent dissection of this correlation! I'm curious as to how this correlation can be proved or falsified. For example, what if it can be shown that in countries where unionization hasn't been battered the way it has in the US haven't seen the same rise in income disparity between average workers and the wealthy? I know correlation is not causation, but what if one can draw multiple cross correlations in such a manner? Does that strengthen the case?





For the record, the reason I ask the above question is that I wouldn't expect this tic for tat sort of correlation that the scatter plot would require. There are many variables in economics. Between 1967 and 2007 we have booms, recessions, and other factors that can offset of bolster the effect unionization have on earnings (assuming it does).

The comments to this entry are closed.

Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter