## Self-sufficient charts

##### Jun 05, 2010

A good example showed up in the New York Times recently of a chart that fails the self-sufficiency test that I often speak about here. First, the doctored chart (with the data removed):

And for comparison, the chart as originally printed (the chart was found only on the paper edition but not on line):

There is little doubt that the second version, with the data -- all four numbers -- printed on the chart, is much more effective, and that is why the designer thought to include them.

This shows that readers are gravitating to the data rather than the graphical constructs, and thus I consider these types of charts not self-sufficient. The graphical constructs can't stand on their own.

***

The choice of pie charts in a small-multiples arrangement is a mistake for this data set. While indeed in theory the winning percentage could range from 0 to 100%, in practice the winning percentages are rather narrowly dispersed (with the exception of the NFL which has a 16-game regular season).

Just quickly looking up the 2009 regular seasons: MLB teams ranged from 36% (Nationals) to 65% (Yankees); NHL ranged from 32% (Islanders) to 65% (Bruins); NBA from 21% (Sacramento) to 81% (Cleveland).

In order to judge whether 60% or 52% is a large or small number, readers need to have a sense of how teams are dispersed around those averages. A side-by-side boxplot brings this out pretty well (the data is for 2009 seasons).

The "box" in a boxplot contains the middle 50% of the teams in each league while the line inside the box depicts the median team (in terms of winning percentage).

The NBA teams showed much higher variability in winning percentages than the NHL or the MLB. The difference in average winning percentage of say, 2% or 5%, from one league to the next is not remarkable, given this fact.

(The original article did not really pertain to such a comparison so the reason for this chart is not clear.)

You can follow this conversation by subscribing to the comment feed for this post.

So the actual intersting story is in the variance, not the mean - i.e. why does the NBA have such a wide range of home advantages compared with the MLB and NHL.

Note that the NBA also has a wider range of overall winning percentages; I think you would need to compare the variance in the home winning percentages to the overall winning percentages across the sports to see if anything special is happening with home advantage in the NBA.

It would be neat to see the 2000-2009 data plotted to expose whether there's any variation in the variation over time.

I don't get your point. If I understand correctly you say that the Nationals won 36% of their matches, the Yankees 65%, etc. But if that winning percentage was always the same, independently of playing at home or away, there would be no home bias when averaging over all the matches. In the other extreme, suppose the local team always wins: the winning percentage for the home team would be 100% but the winning percentage for each team would be 50%. The variability that you show is interesting, but it's not clear to me what is the relation with the home bias displayed in the original chart.

Carlitos: First, your comment makes me realize that I don't have home-team winning percentages; what I have are overall winning percentages. However, my point about needing to know the variability to understand averages still stands.

What the graph seems to be doing is to contrast the home-team winning percentages for different sports. We observe a difference of average WP of about 5 to 10%. As reader, I'd like to understand whether a 5% difference is a "big deal" or not. And the variability gives us the context. By contrast, the pie chart construct gives us the wrong context because it draws attention to the 0-100% range which is mostly irrelevant.

Doesn't including a sentence on how to read the chart mean that you've just failed the self sufficiency test as well?

I do not understand. If you realize that you "don't have home-team winning percentages" but only "overall winning percentages", how is possible that "the graph seems to be doing is to contrast the home-team winning percentages for different sports"?
IMHO the larger variability for NBA (in overall winning percentages which your box-plot graph shows) is simply due to a larger variability in team strengths: the difference between the strongest team and the weakest team is larger than in MLB and NHL.

good work for best job thanx u for admin.

interesting work :/

The comments to this entry are closed.