Reader Dave S. sent me to some very pretty pictures, published in Wired.
This chart, which shows the distribution of types of 311 calls in New York City by hour of day, is tops in aesthetics. Rarely have I seen a prettier chart.
The problem: no insights.
When you look at this chart, what message are you catching? Furthermore, what message are you catching that is informative, that is, not obvious?
The fact that there are few complaints in the wee hours is obvious.
The fact that "noise" complaints dominate in the night-time hours is obvious.
The fact that complaints about "street lights" happen during the day is obvious.
There are a few not-so-obvious features: that few people call about rodents is surprising; that "chlorofluorocarbon recovery" is a relatively frequent source of complaint is surprising (what is it anyway?); that people call to complain about "property taxes" is surprising; that few moan about taxi drivers is surprising.
But - in all these cases, there are no interesting intraday patterns, and so there is no need to show the time-of-day dimension. The message can be made more striking doing away with the time-of-day dimension.
The challenge to the "artistic school" of charting is whether they can make clear charts look appetizing without adding extraneous details.
The New York Times printed an amazing set of graphs about the election results last Sunday. The full version can be found here.
The best thing about this set of charts is recognizing that the aggregate data may obscure differences between subgroups, something I address in Chapter 3 of Numbers Rule Your World. In some cases, the charts go two layers deep, for instance Women 18-29 and Men 18-29.
The organization of the various components of the small multiples effectively splits the groups won by each party. In fact, it's very easy to read off the individual chart titles to describe which subgroups leaned to which party, and by how much.
There is so much in these charts that you can spend an entire afternoon exploring and examining the details.
I wish the charts were made simpler. It's very daunting to process the entire page. In terms of subgroups, what we really care about is the size of the drop-off from the previous election; on this chart, though, the right tail of every single chart seems about the same so one wonders if this were a wipe-out across the board by the Democrats, or did the design decisions obscure the differences between subgroups?
Remove the historical time-series, and focus on the change from this election to the last election: the criss-crosses are very distracting.
Don't show disaggregated charts unless there is a point, and if so, include a note saying what readers should see
I'm not sure the second split by gender adds much to the story here.
Remove the line for the opposite party for each side of the divide, the other line is a mirror image anyway.
Or better yet, try plotting the difference (margin) between the two parties instead of plotting two lines
Remove the share of voters numbers: I appreciate that they're using unbolded font to indicate this data series; perhaps less is more.
Put a bold border around particular charts that readers should pay more attention to, the ones with interesting stories, e.g. the groups in which Democrats made gains irrespective of the overall trend (Liberals, those with better financial situation)
What they have is an excellent chart; I think simplifying it a bit makes it even better.
Has the New York Times lost the plot? This was the thought in my head when I saw this scary thing last Friday, right in the center of the front page of the New York edition:
This Halloween scare is supposed to tell us who the biggest political donors are, and to what party. Here is a close-up view:
Well, this one looks slightly different from the one shown above but this is the one that they are currently carrying on the website.
I think the stacked squares, arranged in this particular way--staggered both horizontally and vertically--is supposed to represent something. I can't figure it out. Maybe a loudspeaker? An accordian?
This chart fails our self-sufficiency test. The only way one can appreciate the scale of the donations is to extract the data from the chart, in which case, this is the same as a data table.
The use of the unexplained red color for the Chamber of Commerce (the subject of the accompanying article) is also problematic. Based on the inset, it's pretty clear that the Chamber is "leaning Republican" so it should just be colored pink.
What is very hard to understand is why the amount of overlapping from one square to the next square is not the same.
We can visualize the problem as follows:
According to the data, the US Chamber of Commerce donated about the same amount as the Democratic Senatorial Campaign Committee ($21.1 v. $22). However, the visible areas on the chart are vastly different. There is a presumption that readers will look behind the squares to see the full square but that's pretty hard to do when they are stacked 15-deep!
What's the amount of distortion due to this design?
A lot. Much of this is due to the fact that the lowest ranked item plotted is an actual square which happens to have the largest unobstructed area.
Here's a dot plot that conveys the essential information with minimal fuss:
Daniel L. points us to the visualization of the 2010 elections by the New York Times. These are pretty good, and do a good job highlighting the most important question: how many seats are up for grabs, and which party they are leaning at the moment. I wonder if these predictions are continuously being updated -- if so, some kind of time-series chart showing the state of competition in the toss-up states would be interesting to look at.
Chris P. takes the folks at Quantcast to task for an innocent looking typo. Lest you think we're nit-picking, the entire chart only contains 12 data points, so the error rate is almost 10%. And, by the way, they commit a much less forgivable error, which is to use different scales in a small multiples setting: the 17% growth on the Yearly chart is the same height as the 5.5% growth in the Quarterly chart. In any case, I'm not understanding why there are charts for monthly, quarterly as well as yearly data.
The headline writers at Business Insider continue to play fast and loose. Yes, the map of bubbles is classic chartjunk but how does this chart lead to their conclusion that "Americans are more caring than 99% of the world"? Pray tell.
For one thing, our neighbor has a bubble that is slightly larger than ours.
The chart itself is a shocker. Instead of boring you with a term paper, I just want to tell you the most counter-intuitive insight I gleaned from this chart: since the more countries there is in a continent, the more caring are its people, we should break the U.S. up into 50 entities tomorrow.
If you have submitted links to me in the past few months, you will see them posted in the next few weeks; I just spent some time looking at all the submissions.
Here are some links that are slightly off-topic (though still interesting), and others I don't intend on writing full posts about:
Daniel L. sent us to Slate, where they posted this chart counting up the human cost of the Afghan War. Applying the Trifecta checkup, he gave this evaluation:
What is the practical question: I have no idea What does the chart say: I have no idea What does the data say: I have no idea
The time series thing coupled with poor use of color obscures whatever patterns you could pick up.
Daniel is right about the last point - by plotting the disaggregated data, readers are forced to stare at the variability of casualties over time, and the progress of the war, which distracts from the idea of "accounting for the dead".
Daniel also argues, and I agree, that this math is meaningless even if done properly.
Understanding Google PageRank - Nick calls this an infographic but it contains zero data. Not the kind of thing for this blog but it does a decent job explaining PageRank.
The part about circular links canceling each other out confuses me; it would seem like good blogs should be able to link to each other without being penalized.
The Ins and Outs of Assisted Living Homes - Ellen G. created this "infographic" explaining what "assisted living homes" are like. Again, not stuff for this blog, as the two bar charts are just tag-alongs that are not well integrated with the rest.
In terms of the charts, please remove 3-D, remove the colors, order the data from largest to smallest, consider a horizontal bar chart with data labels on the left, and title it "the top needs for assisted living residents".
The Facebook privacy chart that's been circulating widely (thanks to Eronarn): a FAQ on how to read the chart sorely needed.
BBC to beam general election results on to Big Ben (thanks to Julien D.): London readers, did this happen?
Another example of an infographics poster (thanks to Daniel L.), this one concerning the use of cell phones by teenagers. Daniel said:
Check out the pie graphs under the sexting category. He's showing his
percentage in color, but leaving the rest of the pie white. Awesome!
Surefire way to get the data-ink ratio right where you want it to be.
Bernard L. sent in this chart a while ago, and with the looming British elections, it's a good time to show it, and ask readers how to spin this election. (Via Guardian)
In particular, could someone help me understand the tri-color spinner? Given that the change in seats for the three parties combined should be zero, I don't get how this can fit into a concentric-circles presentation. If you click on the link to the original chart, you can move the black dot around the circle.
In addition, I'm mystified why the constituencies can be depicted on a graph paper, each one the same size as the other. This is not the first time I have seen the U.K. mapped in this way so there must be some reason behind this choice. (For reference, I have never seen the 50 states mapped in this fashion.)
Reference: "Election Map and Swingometer 2010", Guardian (UK), April 5 2010.
While the total responses were almost evenly split between the three choices, the bar chart drew our attention to the first bar, which is inapt.
If plotted as a pie chart, I thought, the reader would see three almost equal slices. This effect occurs because we are much less precise at determining the areas of slices than the areas of bars. Wouldn't that turn our usual advice on its head?
How the Bar Chart is Saved
The one thing that the pie chart has as a default that this bar chart doesn't is the upper bound. Everything must add up to 100% in a circle but nothing forces the lengths of the bars to add up to anything.
We save the bar chart by making the horizontal axis stretch to 100% for each bar. This new scaling makes the three bars appear almost equal in length, which is as it should be.
Another Unforgivable Pie Chart
On the very next page, Luntz threw this pie at our faces:
Make sure you read the sentence at the bottom.
It appears that he removed the largest group of responses, and then reweighted the CEO and Companies responses to add to 100%.
This procedure is always ill-advised - responders responded to the full set of choices, and if they were only given these two responses, they very well might have answered differently.
It also elevated secondary responses while dispensing with the primary response.