Round-up of up-coming events
After seeing this chart, my mouth needed a rinse

Don't pick your tool before having your design

My talk at Parsons seemed like a success, based on the conversation it generated, and the fact that people stuck around till the end. One of my talking points is that one should not pick a tool before having a design.

Then, last night on Twitter, I found an example to illustrate this. Jim Fonseca tweeted about this chart from Business Insider: (link)


The style is clean and crisp, which I credit them for. Jim was not happy about the length of the columns. It seems that no matter how many times we repeat the start-at-zero rule, people continue to ignore it.

So here we go again. The 2015 column is about double the height of the 2013 column but 730 is nowhere near double the value of 617.

The standard remedy for this is to switch to a line chart, or a dot plot. Something like this can be quickly produced in any software:


Is this the best we can do?

Not if we are willing to free ourselves from the tool. Think about the message: NFL referees have been calling more penalties this year. Compared to what?

I want to leave readers no doubt as to what my message is. So I sketched this version:


This version cannot be produced directly from a tool (without contorting your body in various painful locations).

The lesson is: Make your design, then find a way to execute it.


Feed You can follow this conversation by subscribing to the comment feed for this post.


For some definition of the word "directly" at least. I doubt any R person would consider this contorting:


dat <- data.frame(year=2010:2015,
penalties=c(627, 625, 653, 617, 661, 730))

avg <- data.frame(val=mean(head(dat$penalties, -1)),

gg <- ggplot(dat, aes(x=year, y=penalties))
gg <- gg + geom_point()
gg <- gg + scale_x_continuous(breaks=c(2010, 2014, 2015))
gg <- gg + scale_y_continuous(breaks=c(600, 650, 700, 750),
limits=c(599, 751), expand=c(0,0))
gg <- gg + geom_segment(data=avg, aes(x=2010, xend=2015, y=val, yend=val), linetype="dashed")
gg <- gg + geom_segment(data=avg, aes(x=2015, xend=2015, y=val, yend=last), color="steelblue")
gg <- gg + geom_point(data=avg, aes(x=2015, y=val), shape=4)
gg <- gg + geom_point(data=avg, aes(x=2015, y=700), shape=17, col="steelblue")
gg <- gg + labs(x=NULL, y="Number of Penalties",
title="NFL Penalties Jumped 15% in the\nFirst 3 Weeks of the 2015 Season\n")
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.grid.major.x=element_blank())
gg <- gg + theme(axis.ticks=element_blank())


"NFL Penalties Jumped 15% in the First 3 Weeks of the 2015 Season"

I really don't like that sentence. It tells me that there were 635 penalties in the last three weeks of the 2014 season, which is probably not true. I would prefer:

"Penalties were 15% higher in the 2015 NFL Season's First 3 Weeks than in the average of the previous five seasons' first 3 weeks"

Picky but more accurate, IMO.


Or you can use base R. Shorter code...
Here's the result, also with my idea for the title.

pen <- c(627, 625, 653, 617, 661, 730)
pen_av <- mean(head(pen,-1))
plot(2010:2015, pen, ylim=c(600,750), las=1, bty="n",
ylab="number of penalties", xlab="", pch=4, lwd=3, cex=1.3)
abline(h=pen_av, lty=2)
arrows(x0=2015, y0=pen_av, y1=tail(pen,1)-5, col="blue", lwd=2)
title(main="NFL Penalties jumped 15%\n in the first 3 weeks of 2015\ncompared to previous seasons")
text(2014, pen_av, "5 year average", adj=c(0,1.3))
text(2015.1, 700, "15%\nincr.", adj=0, col="blue", xpd=TRUE)


There is a problem with the link to Business Insider.

I find the whole argument (about the games not the graphs they have problems) rather dubious, as all that is happening is probably random variation. If it matters they should be modifying the rules to make sure there is a greater time spent playing so that when the penalties are higher it is still a good game. One point of relevance is that truncating the y axis has the effect of distorting our perception of whether the variation is random. Having the scale from zero would show that not that much is happening.

My other suggestion is to ignore the NFL and watch the rugby world cup.


Thanks all for the comments.

Ken: I certainly don't intend to provide credence for the underlying analysis.

R coders: R is one of the tools I considered for this. The contortions (for me) include placing the text which also requires specifying the xlim, picking colors, picking symbols, etc. Of course, I can do it; I just know other tools that take less time. Thanks for the code samples though.


Hi Kaiser, I really enjoy reading your blog. Thank you.

What 'tool' do you use for the sketches you present in your posts?


Will: the first take was created in JMP's Graph Builder which is great for sketching. Then I take it to a drawing program to customize labels, text, arrows, etc.

The comments to this entry are closed.