Lose the base, connect the dot, and confuse the message
Chance to ask me a question this Friday

Superimposing time series is the biggest source of silly theories

Business Insider (link) published the following chart and declared "the end of the car age in one chart". The chart superimposed the monthly motor vehicle miles driven per capita and the labor force participation rate.

Bi_milesvspartiipation

This is the conclusion of the post:

There's a logical connection between the two. Not in the workforce? You're less inclined to drive.

It's strange that they chose to show a time series going back to the 1970s. The conclusion is logical only for the last five years of the data. Looking back even another decade, to the last recession (2001), one finds the exact opposite conclusion: as the work force participation rate fell, the per-capita miles driven went up.

The other problem is causation creep, about which I have written on the sister blog (link). This chart merely shows correlation (and that is questionable). The conclusion of cause and effect is purely theory. Another theory would be the rise in telecommuting and work-from-home situations. A counter-theory would be that the unemployed may have more free time to drive. Another theory is that gas prices have gone up:

US-Fuel-Prices-Long-2-19-2013

Any time series you can find that has a peak during the 2000s can be similarly interpreted as having caused people to stop driving. Here's a chart of real house prices from Calculated Risk.

RealPricesDec2012

Falling house prices causes people to stop driving. Or perhaps falling house prices causes people to lose jobs.

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Tom West

If you want to see if there's a link between vehicle miles driven per capita and the labor force participation, then plot one against the other, not both against time!

sigh

There should be a website correlationisnotcausation.com that has lots of data series on it and can generate plots of any of them against any other (or, even better, you can pick one data series and it will list all the others that correlate well with it) and then the internets will be inundated with silly charts like http://imgur.com/1WZ6h and maybe we can finally drive this concept into people's thick skulls.

Ken

Tom, that doesn't fix the problem, which is time confounding. Two variables that are both changing in some pattern due to time will appear correlated. Quite often the change in both is due to increase affluence. At the moment we have affluence going down in many parts of the world so there is a notable peak which makes the effect look even more pronounced.

jlbriggs

@sigh - that's a fantastic idea.

@ken - it doesn't 'fix' the problem of how the correlation will be interpreted by some people, but it will better show - particularly with some regression analysis - if there is an actual correlation to begin with.

Kaiser

Great discussion. In general, Tom's advice is sound. When looking at two data sets, we should first look at a scatter plot, not two line charts.

That said, Ken's point is very true about two time series. In many cases, the time component is so strong that the scatter plot also tells us nothing useful. That is to say, if you look locally, you might see no correlation but if you look globally, the scatter of dots moves around as time shifts, making it look as if there is correlation.

The key is to first de-trend the data. Since this has become a topic, I'll post something about this soon. I started on that path but realized this issue needs its own post as it's not a one-step process.

Kaiser

All: If you wrote a comment and it didn't appear, it's because the Typepad spam filter is producing false positives. I was shocked to find my own comments (like the one above) detected as "spam".

The comments to this entry are closed.