Political winds and hair styling
Sorting out the data, and creating the head-shake manual

Involuntary head-shaking is probably not an intended consequence of data visualization

This chart is in the Sept/Oct edition of Harvard Magazine:

Naep scores - Nov 29 2016 - 4-21 PM

Pretty standard fare. It even is Tufte-sque in the sparing use of axes, labels, and other non-data-ink.

Does it bug you how much work you need to do to understand this chart?

Here is the junkchart version:


In the accompanying article, the journalist declared that student progress on NAEP tests came to a virtual standstill, and this version highlights the drop in performance between the two periods, as measured by these "gain scores."

The clarity is achieved through proximity as well as slopes.

The column chart form has a number of deficiencies when used to illustrate this data. It requires too many colors. It induces involuntary head-shaking.

Most unforgivingly, it leaves us with a puzzle: does the absence of a column means no progress or unknown?


PS. The inclusion of 2009 on both time periods is probably an editorial oversight.




Feed You can follow this conversation by subscribing to the comment feed for this post.


I think you have interpreted the original graph incorrectly. The axis is "gain in score". An increase of near-zero means "no gain" or, as originally described, a "virtual standstill". There are only two decreases - white and black 8th grade math. The absence of a column means the score changed by 0, which means no progress, although all the other 2009-2015 increase are so small that they are approximately no progress.

I agree the first graph isn't the best representation, but at least I can interpret it. Your re-do makes it appear that scores dropped, which is not the case.


On further reflection - the redone graph should have 3 data point on the X axis: 2000, 2009, and 2015. The Y axis should be score relative to 2000. Most go up by 5-20 between 2000 and 2009, and then near-flat from 2009 to 2015. The repetition of 2009 is not an editorial oversight. I think the scores were measured in 2000, 2009, and 2015, so to show the change in score 2009 needs to be used both times, at least for their chosen display.

Andrew Gelman


I think it would be better to plot trends in score rather than gain scores. Looking at gain scores requires a higher level of abstraction and to me just seems to add unnecessary confusion.


I have been trying to plot these types of charts in R. Do you perhaps have some example code I could use?


mankoff & Andrew: Look out for today's post. Short answer is I fetched the raw data and made more charts.

Kate: Didn't do those in R but if I were to do those in R, I tend to make them from scratch. If you follow this path, you would need to pick up these key things:
- setting up panels of charts using par(mfrow=c(x,y)) or similar
- writing a function that creates a single chart with the three lines so that you can run it three times to get a panel of three charts
- within that line chart function, you first suppress the default axes, then draw each axis separately, with custom labeling, tickmarks, etc. then draw in the box. Add gridlines as you please.
- specify line colors and style to your liking (this requires you to look up conversion tables of numbers to colors, and numbers to style... they lied when they say this is easy!)
- configuring the right spacing between charts by manipulating whitespace using par(mar=c(m,n,l,k)) or similar
- use text function to place other annotations such as line labels
- you may encounter more annoying little tasks like turning the axis labels to the right orientation, or needing to reduce font size to fit within plot region

Maybe one of our readers can whip up some sample code for you.


Kate: P.S. You can also use a package like ggplot2. You won't be able to make the chart look exactly like mine but close enough. Hadley recently said he added some functionality to ggplot2, such as the ability to place a chart title.


Thanks very much for the instructions. Will give it a go with ggplot2.

The comments to this entry are closed.