Chart cleanup
Mar 08, 2008
Anna E. submitted this great example from Yahoo! Green. A well-meaning chart but stuffed with redundancy.
Much appear to be going on and yet the entire chart contains 15 data points, Boston's ranks on each of 15 categories. The bar lengths convey the same information as the data labels. The legend provides a catchy name for different levels of ranks (0-10 = "leader"; 10-20 = "advances"; etc.). The colors merely reiterate the catchy titles. Similarly, the colored squares repeat the information in the bars.
In the name of green, we cleaned up this chart:
As a standalone graph, the categories should be ordered by Boston's ranks. Here, we assume that cross-referencing cities is needed so we leave the order unchanged.
I would think the the ranks are somewhat arbitrary--why would all three commuting and transportation elements be together? In the second graph, it is a little hard to decode which line goes with which label. If the categories were ordered by some rank, then you might have blank lines to group them together better or maybe every fourth or sixth line could be half height and blank.
Posted by: Chris P | Mar 08, 2008 at 04:10 PM
Ranks against what? Other big cities? (It's hard to imagine a competition that would have Boston ranking first for local food and agriculture.)
Posted by: Rosie Redfield | Mar 08, 2008 at 05:41 PM
The vertical ticks would be a good choice if they were in a crowd of ticks, such that a fine line was needed to avoid frequent overlapping. But where each data symbol is guaranteed a row all to itself, I think it can be allowed to spread in two dimensions for visibility. A diamond shape would ensure that the exact position of the symbol was still established, without the danger of invisibility represented by the thin line.
Posted by: derek | Mar 10, 2008 at 06:35 AM
I searched in vain for more info on the methodology used, but the links seemed to lead back to the Sustainlane ad network site.
I'm not sure suboptimal graphics are the main problem with this "ranking". Looks like GIGO.
Posted by: ZBicyclist | Mar 10, 2008 at 05:57 PM
Sustainlane's account of its methodology is here, but it just amounts to "we did stuff, and trust us, it was all straight". It's lacking a table of the actual inputs that informed the rankings.
Someone with the patience to gather the rankings for all 50 cities could construct a parallel coordinates graph, and perhaps also use the colored labels to mimic a Spotfire-style "brushing" technique.
But, you know, rankings and not quantities. It takes all the interest out of such a project. Rankings are teh suck.
Posted by: derek | Mar 11, 2008 at 04:01 AM
There are two separate issues here. Regarding the first - the arbitrary and fuzzy nature of the data gathering methodology - I am in full agreement with previous comments. The value of the data itself seems quite suspect.
For the sake of argument, however, let's pretend the data is valid and interesting. I am more interested in the question of how to best represent data of this type. And here I find myself in slight disagreement with the post.
The revised, junk chart version, is clearly better, but I am not convinced that 'redundancy' is always bad in graphic representation (pace Tufte). For example, I'd like to be able to glance at the labels and quickly identify the couple of categories that are in the 'danger' zone without having to take the extra step of looking at each respective bar.
Also, I'd like to be able to get the value for any category without needing to look down at the horizontal axis - I like having the actual value label by the bar even if technically it is redundant with the length of the bar.
The bar lengths and the bar labels serve two very different purposes: The length allows me to intuitively process relative sizes (I can immediately perceive "roughly twice as much as" by looking at the relative lengths, comparing '42' and '17' requires that my conscious mind get involved.) At the same time, once I've intuitively absorbed the big picture, I'd like to be able to drill down and look at a few select individual components. For that, I need the exact labels.
Here is a quick attempt at trying to achieve both of these goals:
http://www.flickr.com/photos/24579696@N05/2328117979/
Posted by: Zuil Serip | Mar 12, 2008 at 07:26 AM
I find it really difficult to track which tick marks go with which category in the revised chart.
Posted by: curioser | Mar 12, 2008 at 12:11 PM
Better link to chart mentioned above:
http://farm4.static.flickr.com/3263/2329410262_7726457637.jpg
Posted by: Zuil Serip | Mar 12, 2008 at 12:43 PM
I think Kaiser was right not to re-order the categories, since the Boston table is just one of 50 cities, and they won't all be in the same order.
I'd also like to see the bars be longest for the categories where Boston is ranked #1 and shortest where the city is ranked worst.
Posted by: derek | Mar 12, 2008 at 03:51 PM
Your chart is a little better, but the whole thing would be much better if the data were sorted.
-- Gary Klass
http://lilt.ilstu.edu/jpda/
Posted by: Gary Klass | Mar 13, 2008 at 03:23 PM
Good to see all the comments. It's clear everyone has opinions on how the ratings should have been done.
I'd echo Derek's comment, which is to bear in mind the bigger picture... that this one chart needs to be considered as part of a "small multiples" layout of 50 such charts. This challenges the usual advice of sorting, color labels, etc.
Posted by: Kaiser | Mar 17, 2008 at 12:35 AM
IMHO, you should not encourage people to "analyze" data by hacking interval-scaled measures up into categories & sticking arbitrary labels on the categories. Just show the rankings & let them speak for themselves.
Posted by: Georgia Sam | Mar 19, 2008 at 12:51 PM
Georgia Sam: Point taken. Thanks for bringing it up. By converting the original scale to ranks, the differences between cities have already been eliminated, and now to assign bins based on ranks just makes it worse.
Posted by: Kaiser | Mar 28, 2008 at 12:18 AM