## The meaning of pretty pictures and the case of 15 scales

##### Jul 09, 2011

When we call something a "pretty picture", what do we mean?

Based on the evidence out there, it would seem like "pretty" means one or more of the following:

• unusual: not your Grandma's bar chart or line chart
• visually appealing: say, have irregular shapes, lots of colors, curved lines and so on
• complex: if you don't get the point right away, the chart must be smart, and must contain a lot of information
• data-rich: a variant of complex

***

I pondered that question while staring at this chart, reprinted in the NYT Magazine, in which they pitched a new book by Craig Robinson called "Fip Flop Fly Ball".  According to the editors, the book is a "beautiful, number-crunched (sic) combination of statistical and graphic-design geekery". So here's Exhibit A:

This chart is supposed to tell us whether big payroll equals success in Major League Baseball, and success is measured variously by making the playoffs, making the championship series or winning the championship. It nicely uses a relatively long time horizon of 15 years.

The problem: how are we supposed to learn the answer to the question?

To learn it, we have to go through these steps:

Read the fine print under the title that tells us the vertical scale is the rank by payroll, so within each season, the top spender is at the top, and the bottom spender at the bottom. (Strictly speaking, there are 15 different scales, see discussion below.)

Figure out that the black row has all of the championship teams aligned at the same vertical level.

Realize that the more teams that are listed below the black line, the bigger the payroll of the championship team in that season.

Alternatively, the more teams that are found above the black line, the smaller the payroll is of the winning team that year.

From that, we see that for almost every season in the last 15 years, the winner comes from a relatively free-spending team. Florida in 2003 is a big outlier.

***

Maybe that isn't too bad. Now, try to interpret the blue boxes, which label all the playoff teams in every season. Is it that playoff teams also are bigger spenders than non-playoff teams?

To learn this, try the following step:

Ignore the relative height of the columns from season to season, and focus only on the relative positions of the blue slots within each column.

Are these blue slots more likely to be crowded towards the top of the column than the bottom?

The answer should be obvious but why does it feel so hard?

***

You may be confused by the vertical scale. Is it the case that in 2003, the entire league decided to splurge on spending? Does the protruding tower in 2003 indicate especially high payrolls?

No, it doesn't. It turns out there are really 15 separate vertical scales on this one chart; each column has to be viewed separately. There is a ranking within each column but the relative height  from one column to the next means nothing. Each column is hinged to the black row which is the rank by payroll of the championship team in that season.

The decision to anchor the columns in this way is what dooms this chart. In the junkart version below, I reversed this decision and ended up with a much clearer picture:

It's now clear that almost all the playoff teams come from the top quartile or top third of the table in terms of payroll. In more recent years, the correlation between spending and success seems less assured - perhaps it's partly a result of the analytics revolution, as nicely portrayed in Moneyball. It is still true that any team in the bottom third of the payroll scale has little chance to making the playoffs; however, once the smaller-payroll team makes the playoffs, it appears that they do well, as in three of the last four seasons, a small-payroll team has made the finals.

Note that I grayed out the four cells at the bottom left. There were only 28 teams before 1997. I also removed the names of the teams that didn't make the playoffs, which serves no purpose in a chart like this.

***

That's the descriptive statistics. It's really hard to draw robust conclusions from such data. You can say it's harder for small-payroll teams to have consistently great performance in the regular season but easier in a short playoff series - so in a sense, we are looking at luck, not skill.

But could it be that those small-payroll teams, given that they made the playoffs, must have some usual success in that season, perhaps because they discovered some young talent that cost next to nothing, and so the fact that they made the playoffs despite the smaller payroll is a good predictor that they would do well in the playoff?

The other important issue to realize is that by plotting the rank of payroll, rather than true payroll, the scale of payroll differences has been taken out of the picture. The team listed at the median rank most likely spent much less than half of the team listed at the top of the table. If you grab the actual payroll amounts, there is much more you can do to display this data.

Amazing how the 2003 outlier distorts the entire chart.

In the cleaned up version, the vertical scale should be consistent moneys, instead of just ranking. Did the world series winner out-spend the loser by \$1 or by \$100 million?

I think that "pretty" and "elegant", at least when it comes to statistics, should start with an image that is instantly understandable. You're exactly right here in that the original is bizarrely difficult to parse.

I agree that those are not pretty pictures! For me, a pretty picture when it comes to data visualization is something that is both aesthetically pleasing and allows for straightforward information discovery. I recently blogged on a visual I would consider to be a very pretty picture - more here: http://www.storytellingwithdata.com/2011/07/breathtaking-data-art.html

I enjoy your blog!

I think you might actually find the data surprisingly difficult to work with. One of my first dashboard projects attempted the same thing using the data that can be found here:

(it's basically my permanent dummy data for all my learnin')

Here's something that bugs me about this data. The biggest outlier there are the Yankees. They outspend the next highest team by quite a bit. If you're talking about highest salary and best performing teams, to me, that feature has to be visible on any chart.

I completely agree -- that NYT chart is rubbish! I'm glad I stumbled on this post to see somebody correct that visual insanity.

I could have saved myself a few long seconds of quizzical/befuddled staring by simply skipping to your version -- it's much better.

The comments to this entry are closed.