« November 2012 | Main | January 2013 »

English donuts rival Spanish donuts

On my holiday travel, I found a disguised donut chart in the Delta Sky Magazine (Dec 2010), talking about manufacturing jobs in the U.S. Then, flipping through the Spanish section at the back of the same magazine, I found the translated article, plus a translated chart. To my surprise, they look different:


Surprise No. 1: the sizes of the cog wheels are different. Even though the color is still mapped to year in the same way, somehow one of these authors decided to take liberty with the relative size. The suspect is the Spanish author who decided to make 2009 much larger and Jan to Dec 2012 much smaller.

Surprise No. 2: the use of commas within a number, and the format of dates differ by culture. That explains why the Spanish author removed the commas from the numbers, making it harder for me (English-speaking) to comprehend. Also, the swap from "01/12-09/12" to "Sep. 2012" suggests that Spanish speakers don't like the month/year formatting of dates. It also suggests that the Spanish readers have no trouble inferring that the "Sep. 2012" data point refers to "Jan. 2012 to Sep. 2012".

Surprise No. 3: The Spanish author improved the chart in one way. He grouped the annual data together via overlapping, leaving the 2012 partial-year data point by itself.


There are some problems with both charts. The most serious is the failure to project the 2012 jobs number. The chart seems to indicate that 2012 is a lackluster year, at best level with the previous years but in fact, the number of jobs in three quarters has already exceeded the full-year count of 2011, 2010 and 2009. Unless the fourth quarter is a particularly bad quarter for manufacturing jobs, it would seem that the message should be that 2012 is a great year of recovery. You can't tell from these charts: in particular, the Spanish author decided to shrink the 2012 cog wheel into insignificance.

The issue here is providing context for comparison. Even if the projected 2012 full-year number is provided, that may not be enough to judge whether manufacturing is healthy. Other useful context can be the growth rate of manufacturing versus other sectors of the economy; and the growth rate of jobs in relation to the population/work force growth rate.

As usual, a simple line chart displays the time-series data more clearly. (I simply linearly extrapolated the 2012 full-year number, which is probably an over-estimate. In practice, you can look up the data and figure out the ratio of Jan-Sept jobs to full-year jobs on average and inflate the number that way.)





Visualization as an analysis tool

Visualizing data has many uses. We often explore how charts can be used to convey data insights and tell stories. We talk less on this blog about how slicing and dicing data helps us form impressions about the structure of the data sets we're analyzing.

I have been digging around some payroll employment data recently. (You can find the data at the Bureau of Labor Statistics website.) I thought the following two charts are quite instructive.

The first one surfaces one type of recurring patterns: there is a seasonal pattern running from January to December that repeats every year. I use a small-multiples setup, with each chartlet indiced by year.

Seasonalfactor_monthly_by yeargroup

The second chart shows a different kind of regularity: there is a cyclical pattern running from 2002 to 2012, no matter which month we're looking at. Again, we have a small-multiples setup, this time with each chartlet indiced by a month of year.

Unadj_yeartoyeartrend bymonth

This second chart is a simple form of "seasonal adjustment". The data used in this plot are unadjusted. The chart shows that there is a larger cyclical pattern during the period of 2002-2012 that affects every month of the year.

I already hear grumbling about using a line chart when there is no continuity from one dot to the next. In this chart, in fact, time runs left to right, top to bottom, then starts again at the first chartlet, and so on. This is a profile chart. As the name suggests, we should be focused on the shape of the line. It doesn't have to have physical meaning; we are only looking for regularity.


Statisticians love to find this kind of regular patterns because they are easy to describe. Of course, most data are much messier.

Bracket as target

The MLB found an innovative way to present the play-off matchup and results:


I took this photo at the MLB Fan Cave in Manhattan. This was a marketing gimmick in which a bunch of guys were placed into this "fishbowl" and watched every game in the past baseball season. Excuse the scaffolding that was blocking the view.

I like the metaphor of hitting the bull's eye target, and the smooth progression from outside circles to inside. The design also accommodated the wild-card round well.


By contrast, here is the usual bracket presentation:


(This image came from the MLB Fan Cave website.)

There's life without animation

Just as we don't always need a map to do justice to geographic data, we don't always usually need animation to convey time-varying data.

Some examples of good visualizations of time-series information without moving parts include the "horse-race" charts on the Presidential election (link), and the NY Times plot of Olympic race times (link).

Reader Jeff Cole did some nice Web charts (link) that show NFL matches as horse races. The look of these charts is exceptionally clean and easy to understand. They tell us which matches were blow-outs, and which ones were close, and which ones were tales of two halves.


This Green Bay-New England match looked like a thriller, with five lead changes, and a final score gap of only 4 points. Green Bay scored pretty much regularly through the four periods while New England had two relatively long droughts in scoring.

I'm not sure if the Margin of Victory section is worthwhile. It seems redundant to me. The labeling can be better, showing that New England leads are above the zero reference line while Green Bay leads are below the line. I'd consider making this a cumulative amount of time in the lead up to a specific point of the match. That would give an extra piece of information that is difficult to grasp from the Game Scoring chart.


This was an exciting match for a different reason. Dallas was always ahead and eventually won by three points. But it was a shootout in which each team scored regularly. Dallas had a scoring drought for most of the seond half which allowed Washington to get even but then scored a field goal before the final whistle to come out on top. I didn't watch the game but this chart tells me all of that.

On the Game Scoring chart, I'd add vertical tickmarks to help read the intermediate scores. Also maybe put dots to highlight the crossover points, which are where lead changes occur.

NFL is a complex game that is difficult to fully capture in a simple chart like this. It would be nice to add some extra event indicators, such as interceptions, fumbles, and other turning points. That's easier said than done, especially when trying to automate the graph production.

Four numbers say little, even on a busy chart

Reader Robert J. calls this a "really bad" chart (link). The data-ink ratio, he notes, is horrible.

The message of the chart can be stated in one or two sentences. And it's not clear what the other items are buying us. I usually love text annotations but three for this simple chart are too many.

The biggest issue I have are the axes. What is left unsaid is whether the inability to perceive outside the zone of human ability is inconvenient or not. What's missing is a histogram of the stimuli. I'd guess that the distribution is uneven, and there is a concentration inside the humanly perceptible zone. It would be helpful to include what type of stimuli exists at different frequency bands to illustrate what we are missing.

A different comparison that helps interpretation is other mammals. What is the range of perception of dogs, pigs, cows, etc.?

We should also think carefully before putting these two independent quantities onto the same chart. The rectangular region is merely a construct of the chart designer. In fact, the size of the rectangle is arbitrary as the scales on either axis can be made however large we want.