« Lose the base, connect the dot, and confuse the message | Main | Chance to ask me a question this Friday »


Tom West

If you want to see if there's a link between vehicle miles driven per capita and the labor force participation, then plot one against the other, not both against time!


There should be a website correlationisnotcausation.com that has lots of data series on it and can generate plots of any of them against any other (or, even better, you can pick one data series and it will list all the others that correlate well with it) and then the internets will be inundated with silly charts like http://imgur.com/1WZ6h and maybe we can finally drive this concept into people's thick skulls.


Tom, that doesn't fix the problem, which is time confounding. Two variables that are both changing in some pattern due to time will appear correlated. Quite often the change in both is due to increase affluence. At the moment we have affluence going down in many parts of the world so there is a notable peak which makes the effect look even more pronounced.


@sigh - that's a fantastic idea.

@ken - it doesn't 'fix' the problem of how the correlation will be interpreted by some people, but it will better show - particularly with some regression analysis - if there is an actual correlation to begin with.


Great discussion. In general, Tom's advice is sound. When looking at two data sets, we should first look at a scatter plot, not two line charts.

That said, Ken's point is very true about two time series. In many cases, the time component is so strong that the scatter plot also tells us nothing useful. That is to say, if you look locally, you might see no correlation but if you look globally, the scatter of dots moves around as time shifts, making it look as if there is correlation.

The key is to first de-trend the data. Since this has become a topic, I'll post something about this soon. I started on that path but realized this issue needs its own post as it's not a one-step process.


All: If you wrote a comment and it didn't appear, it's because the Typepad spam filter is producing false positives. I was shocked to find my own comments (like the one above) detected as "spam".

The comments to this entry are closed.


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter