## Visualization as an analysis tool

##### Dec 19, 2012

Visualizing data has many uses. We often explore how charts can be used to convey data insights and tell stories. We talk less on this blog about how slicing and dicing data helps us form impressions about the structure of the data sets we're analyzing.

I have been digging around some payroll employment data recently. (You can find the data at the Bureau of Labor Statistics website.) I thought the following two charts are quite instructive.

The first one surfaces one type of recurring patterns: there is a seasonal pattern running from January to December that repeats every year. I use a small-multiples setup, with each chartlet indiced by year.

The second chart shows a different kind of regularity: there is a cyclical pattern running from 2002 to 2012, no matter which month we're looking at. Again, we have a small-multiples setup, this time with each chartlet indiced by a month of year.

This second chart is a simple form of "seasonal adjustment". The data used in this plot are unadjusted. The chart shows that there is a larger cyclical pattern during the period of 2002-2012 that affects every month of the year.

I already hear grumbling about using a line chart when there is no continuity from one dot to the next. In this chart, in fact, time runs left to right, top to bottom, then starts again at the first chartlet, and so on. This is a profile chart. As the name suggests, we should be focused on the shape of the line. It doesn't have to have physical meaning; we are only looking for regularity.

***

Statisticians love to find this kind of regular patterns because they are easy to describe. Of course, most data are much messier.

I can't see how you can complain about a line chart here. The two variables are both continuous (time and unemployment). At a point mid-way between in time between two points, the unemployment will be (roughly) mid-way between the unemployment at those two points. So it's perfectly sensible.

It seems to me the more effective version of both these plots would be to have, in every multiple, a line representing median values for the whole set. The shapes of the curves are too similar for my eyes to judge differences between them.

This reminds me of a conversation I had with Bill Cleveland once. He showed me some set of graphs similar to those above, and I suggested he pull out means and plot residuals. He replied that he had formerly been a big fan of such decompositions but in his recent experience had felt that, as much as possible, he preferred to plot the raw data, that much could be learned from a careful set of graphs that displayed every point in the data just as it was.

I don't know if Cleveland's idea would work so well with discrete data, but it's an interesting thought.

Andrew and Andy: I did all kinds of plots that I didn't show of the type you're thinking about. And I agree that what's hidden in this small-multiples view is the small variations from year to year. The charts here present a good illustration of stronger versus weaker effects.

In particular, a good complement to the charts above is one that collapses all the separate multiples and use year as overlay. That one allows readers to observe both the stability of the seasonal pattern plus the small year-to-year shifts. However, I think it's harder for people to grasp.

Andrew: one thing I like doing is to plot the raw data after the modeling phase. Use the model to figure out which variables are important, then do plots of the raw data focused on those variables. It's a nice model checking device to make sure that the data actually behave the way the model says they should.

With data analysis such an important part of staying competitive in the business world, companies must have the tools needed in order to effectively do the job. With nothing to lose, but maybe a little time, this deal seems too good to pass up. This is worth downloading and tucking away. Useful article.

This is a good illustration of the difference between seasonality and cyclicality. Superficially they look similar here (due to the number of years), although they are conceptually much different.

The comments to this entry are closed.