## Guess which day I made this chart

##### Mar 17, 2012

ESPN Magazine issued a special analytics edition to ride the Moneyball bandwagon. In an article talking about the disappearing midrange jump shot from college basketball, they put out this chart:

In the caption of the chart, the key conclusion is: "As you can see, threes reign outside the lane." Well, we must be blind, since that conclusion is very difficult to draw from what we see. A number of reasons contibutes to this failure:

• In a chart like this, the reader is cued to the length of the arcs. But the arcs related to three-pointers are all medium length -- they don't stand out, exactly the opposite of what the caption is saying
• It's impossible to interpret the scale of the chart. Compare the blue line on the left (Missouri around-the-basket attempts) and the yellow line in the middle (Kentucky three-pointers attempts). They both say 357 but the lines are clearly of different lengths.
• The analyst is attempting to make a general statement about "college hoops" while the data being presented are from six specific teams. This means that readers are spending time digesting the variability between schools rather than understanding the commonality across schools.

The problem of this type of "racetrack graph" has been discussed here before (see here or here). By using ellipses rather than circles, this chart makes things worse. Now, we can't even imagine where the center of the circle is to judge the angles.

The following line chart has a few revelations:

• The six schools are not all the same in terms of their shot selection. In particular, California is the exception to the rule. Also, Missouri and to some extent Syracuse are extreme examples where their players try about the same numbers of three pointers as around-the-basket shots. In our Trifecta checkup (explanation), this means the data used on the chart is out of sync with the key question being addressed. No amount of graphical wizardry can fix this problem.
• The new chart uses much more sensible units, attempts per game. The original chart shows total attempts for matches up to the day the chart was prepared. To make matters worse, the designer did not disclose anywhere what day that was, or how many games were included. By looking at season-end statistics (34 total games), it appears to me that the data being plotted are the total attempts in the first 22 games (up to the end of January). No reader can interpret total attempts in the first 22 games. I just divided each number by 22, and for anyone who follows basketball, this unit is much more interpretable.
• What determined the order of the six schools being plotted? Your guess is as good as mine. In our version, I sorted the schools by the ratio of three-pointers to midrange jump shots. So Missouri and Syracuse came out top because they focus so heavily on three-pointers at the expense of midrange shots. At the other extreme, California uses both types of shots in about equal proportions.

Regarding the new proposed chart, it is really easier to interpret and the data makes a lot more sense. My only problem with it is "connecting" the schools; following the lines doesn't mean anything, so the data should be unconnected dots.

Jeruza: You won't be the only reader to feel this way. Over the years, I have had complaints from readers about lines connecting categorical data every time I put up such a chart. Here's my reasoning: follow your eyes as you read a dot plot, you are visually tracing the lines that I have drawn, why not just draw the lines?

I think the reason to not connect the lines is that, given the charts most of us see every day, we expect the x-axis on a line chart to be time. Despite knowing - very clearly - that it wasn't time, I still found myself thinking, "Why did they come together later on?" [Disclosure: I know nothing about basketball, so maybe this was just due to my ignorance.] I say: don't fight it! If people expect one thing and you're giving them something else, it's just a little bit harder for them to understand your message, which isn't what you want. That little difference with a dot plot not drawing the lines is just right: you don't automatically think, 'time series', but you can still follow along.

To add to what others have said, I would not necessarily expect the x-axis to be time, but I would expect it to display a metric variable, preferrably one which can be usefully interpreted as independent.

I tried an alternative approach here. The first and second highest and lowest are shown as a box and whisker plot. I know nothing about baseball, so I assume "the lane" is around the basket, and 3 point jump shots are more common than midrange.

While I'm with Kaiser on the complete acceptability of lines in category graphs (check out parallel coordinates for a graph type where this is not only okay, but vitally necessary) I am smug that a box and whisker chart doesn't have to worry about the controversy :-)

Kaiser, thanks for the reply. I can see your reasoning, but I'd argue that I don't actually "follow" the dots, I compare then. MAYBE I would be less whiny if there were big dots and subtle lines connecting them, hehe.

Still, I agree with Lemmus: if there are lines I expect x to mean something on its own.

But I promise to not fiercely fight about this on every single chart you ever post. Just on some. =]

I also liked Derek's suggestion, but for few data points I prefer Dot Plots to Box Plots. Like this: http://postimage.org/image/ne931c9hd/
(with a better label, of course; that's just me being lazy, sorry.

The problem with either one is comparing schools. If this was an academic paper it probably wouldn't be a big deal, but newspaper folk will need an far-from-the-chart label or names directly on the chart (Derek's approach).

Ahhh....the world wide leader screws something else up......

Kaiser:
"my reasoning: follow your eyes as you read a dot plot, you are visually tracing the lines that I have drawn, why not just draw the lines?"

I think this is very poor reasoning.

A dot plot allows you to follow whatever path of comparison you choose, and does not imply a pattern that does not exist.

The line chart very explicitly instructs the user to follow the patter of the line, which in cases like this are completely without meaning of any kind.
It precludes the kind of comparisons that should be made with this data - how do the differences within each category compare, and how does this data point compare to the same data point in the other categories - not easy to do with this chart.

You are led to conclude that 3-pointers have really declined! :)

The argument of a parallel coordinates chart is not valid to justify a line chart like this either - parallel coordinates charts are very specific and use lines for very distinct purposes, where the pattern form *is* meaningful.

derek:
"I am smug that a box and whisker chart doesn't have to worry about the controversy"

Sorry, but I have to disagree with this as well :)
It depends on the goal of the visualization, of course, but a boxplot doesn't give us much meaning here either. It's great for portraying the distribution of each type of shot, but what do we learn from that in this particular application?

--

A dot plot, or a set of three bar charts would make the most sense for displaying this data in an easily comparable manner.

A couple of quick examples:
Dot Plot 1: http://jsfiddle.net/jlbriggs/FXaju/embedded/result/
Dot Plot 2: http://jsfiddle.net/jlbriggs/2T2ey/embedded/result/
Small Multiples: http://jsfiddle.net/jlbriggs/Mmd8N/embedded/result/

I believe all three examples better facilitate relevant comparisons between values and categories.

