« Lose the base, connect the dot, and confuse the message | Main | Chance to ask me a question this Friday »


Tom West

If you want to see if there's a link between vehicle miles driven per capita and the labor force participation, then plot one against the other, not both against time!


There should be a website correlationisnotcausation.com that has lots of data series on it and can generate plots of any of them against any other (or, even better, you can pick one data series and it will list all the others that correlate well with it) and then the internets will be inundated with silly charts like http://imgur.com/1WZ6h and maybe we can finally drive this concept into people's thick skulls.


Tom, that doesn't fix the problem, which is time confounding. Two variables that are both changing in some pattern due to time will appear correlated. Quite often the change in both is due to increase affluence. At the moment we have affluence going down in many parts of the world so there is a notable peak which makes the effect look even more pronounced.


@sigh - that's a fantastic idea.

@ken - it doesn't 'fix' the problem of how the correlation will be interpreted by some people, but it will better show - particularly with some regression analysis - if there is an actual correlation to begin with.


Great discussion. In general, Tom's advice is sound. When looking at two data sets, we should first look at a scatter plot, not two line charts.

That said, Ken's point is very true about two time series. In many cases, the time component is so strong that the scatter plot also tells us nothing useful. That is to say, if you look locally, you might see no correlation but if you look globally, the scatter of dots moves around as time shifts, making it look as if there is correlation.

The key is to first de-trend the data. Since this has become a topic, I'll post something about this soon. I started on that path but realized this issue needs its own post as it's not a one-step process.


All: If you wrote a comment and it didn't appear, it's because the Typepad spam filter is producing false positives. I was shocked to find my own comments (like the one above) detected as "spam".

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Good Books

Keep in Touch

follow me on Twitter