A gift from the NY Times Graphics team
May 02, 2013
This post is long over-due. I have been meaning to write about this blog for a long time but never got around to it. It's like the email response you postponed because you want to think before you fire it off. But I received two mentions of it within the last few days, which reminded me I have to get to work on this one.
One of the best blogs to read - that is similar in spirit to Junk Charts - is ChartNThings. This is the behind-the-scenes blog of the venerable New York Times graphics department. They talk about the considerations that go into making specific charts that subsequently showed up in the newspaper. You get to see their sketches. Kind of like my posts here, except with the graphics professional's perspective.
As Andrew Gelman said in his annotated blog roll (link), ChartNThings is "the ultimate graphics blog. The New York Times graphics team presents some great data visualizations along with the stories behind them. I love this sort of insider’s perspective."
The other mention is from a friend who reviewed something I wrote about fantasy football. He pointed me to this particular post from the ChartNThings blog that talks about luck and skill in NFL.
They have a perfect illustration of how statistics can help make charts better.
Start with the following chart that shows the value of players picked organized by the round in which they are picked.
Think of this as plotting the raw data. A pattern is already apparent, which is that on average, the players picked in earlier rounds (on the left) have produced higher value for their clubs. However, there is quite a bit of noise on the page. One problem with dot plots is over-plotting when the density of points is high, as is here. Our eyes cannot judge density properly especially in the presence of over-plotting.
What the NYT team did next is to take the average value for all players picked in each round in each year, and plot those instead. This drastically reduces the number of dots per round, and cleans up the canvass a great deal.
It's amazing how much more powerful is this chart than the previous one. Instead of the average value, one can also try the median value, or plot percentiles to showcase the distribution. (They later offered a side-by-side box plot, which is also an excellent idea.)
The post then goes into exploring a paper by some economists who wanted to ignore the average and focus on the noise. I'll make some comments on that analysis on my other blog. (The post is now live.)
One behind-the-scenes thing I'd add about this behind-the-scenes blog is that the authors must have spent quite a bit of time organizing the materials and creating the streamlined stories for us to savor. Graphical creation involves a lot of sketching and exploration, so there are lots of dead ends, backtracking, stuff you throw away. There will be lots of charts with little flaws that you didn't care to correct because it's not your final version. There will be lots of charts which will only be intelligible to the creator since they are missing labels, scales, etc., again because those were supposed to be sketch work. There will even be charts that the creator can't make sense of because the train of thought has been lost by the end of the project.
So we should applaud what the team has done here for the graphics community.
I'm definitely in favour of the side-by-side boxplots for this.
Principally because the horizontal-scale variable "round" is discrete (ordered categorical if you wish). The takeaway is that there is a downward trend in mean/median, but an awful lot of variability (that decreases slightly as round # increases). This is exactly what a boxplot (or series of boxplots) shows, and it does so without clutter.
The top plot has introduced a spurious horizontal scale (unless there is meaning to left side vs. right side of the Round 1 box in the first plot, which is not mentioned). The second plot has those same issues (what is the horizontal scale? Is it pick # overall?).
If the statistical issue in question is "do players picked in an earlier round of the draft tend to have a higher value?", boxplots will answer that; your plots appear to go after the related but different question of "is player value related to draft pick number", with the round # of the draft being incidental to that. For that, a lowess curve on plot #1 would do the job for me.
Posted by: KenButler12 | May 02, 2013 at 05:17 PM
"Our eyes cannot judge density properly especially in the presence of over-plotting."
Make the dots transparent?
Posted by: Yep | May 02, 2013 at 08:46 PM