« Light entertainment | Main | Leaving ink traces »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341e992c53ef012877186bea970c

Listed below are links to weblogs that reference From light to heavy:

Comments

Daniel Pope

If the baseline is actually a site-wide average (as mentioned in their previous posts) then it's a more meaningful baseline, because it includes vastly more people with more varied profiles that by the central limit theorem will average out specific quirks. There could be better justification that certain categories of profile photo damage your chances of getting a reply.

It would be interesting if the median and quartiles were plotted to see how big these effects are against the population as a whole.

Hadley Wickham

I don't understand why you think a 0 baseline is useful in this case. There is no expectation that any strategy will have 0% success rate, so it makes much more sense to compare each strategy to the overall average. The height of the bar is not the absolute success rate, but how much better (or worse) that strategy is than the average. Sure you could tweak the plot by converting bars to points and rotating 90 degrees, but as it is the plots are perfect for the target audience.

Tom West

If you're going to compare with the average, then the plotted value (and axis) should be percentage incraese/decrease versus the average. (So, if the average is 40%, and the value is 50%, you would plot 25%).

Hadley Wickham

@Tom Why? There's nothing wrong with the plot as it is, and making people multiply percentages in their head (i.e. a 50% decrease is not the same as a 50% increase) is confusing.

Ben

I fully agree with Hadley. Bars versus points seems like an entirely aesthetic decision (surely you don't advocate your "Increase in value from next lowest feature" graph as a replacement; you have to integrate in your head in order to read that!). The color dividing line should clearly NOT be between the second and third as you suggest -- the first three bars belong to the class of "improves your odds versus average chance" whereas the remaining set of bars belongs to the class of "hurts your odds versus average chance". Perhaps the colors should have been green and red instead of blue and grey, but clearly a big, bad bar is universally recognized as negative; I don't think that is very dizzying. Finally, charts don't have to exactly duplicate the text as you seem to suggest. By reading 13%, 10%, 3%, -2%, etc and seeing 40%, 37%, 30%, 26%, etc, the reader is given two perspectives on the data -- by reconciling them, the reader gains a better understanding of the results. I think the authors of this blog post did an amazing job of communicating data concepts which would otherwise be fairly difficult for most people to grasp, or which would have required a much longer written explanation.

Why not focus on graphs that are clearly misrepresenative, like this one?
http://money.cnn.com/news/specials/storysupplement/stimulus-tracker/index.html?hpt=C2

Perhaps you should change the name of this blog to "Charts That Aren't Exactly Like the Ones I Would Use".

Kaiser

Hadley: I think you are assuming that I am suggesting the second chart (which starts at zero) as an improvement. No, I use that chart to illustrate the two data series that are being mashed up in the original chart. If I were plotting this data, and if I think over/under versus the average is the right metric, I'd use the same plot (sideways) but align the axis labels to depict % over/under the average.

Ben: please refer to Cleveland to learn about why dots are often better than bars; there is empirical evidence to support this.

Further, the idiocy of always using the average as a dividing line is for all to see on the original website. Scroll down to the chart titled "Male Photo Contests: Clothes" and tell us that "normal clothes" should go with "no shirt" but not with "all dressed up".

You are welcome to disagree with my opinions - but I like the name of the blog.

Aleks

The problem with the original chart is that the salience of labels is too low and that the salience of arbitrary percentages is too high. Moreover, it's not clear where does the average come from.

Hadley Wickham

Kaiser: Citations would be useful when you claim results from Cleveland: I think you're arguing based "The final issue is whether dot charts-that is, ordinary bar charts and not divided bar charts-are more effective than bar charts even when there is a meaningful baseline, as in Figure 8" (in "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging"). But there Cleveland is arguing from his beliefs not data.

Using % under/over is confusing because then you have to explain to the reader if the percentage is multiplicative (i.e. 50% improvement = x * 1.5) or additive (50% improvement = x + 0.5). (Plus you lose the information about the average response rate). Interestingly, German has two different words for these two types of percentages. I wish English did too!

Aleks: Agreed that more info about average line would be useful. However these posts are part of a sequence and I'm pretty sure it's explained more earlier in the sequence.

Kaiser

Hadley: So I dug up the references.

In both the article you cited, and a related JASA article "Graphical Perception: Theory, Experimentation, and Application to the Development Graphical Methods" (with McGill), Cleveland argued that dot charts are preferred to bar charts. In the latter article, he explained the empirical evidence for his many recommendations but admitted that the preference of dots to bars is due to a different reason explained in the first article. The reason is as follows: "A reasonable principle for the design of graphs is to make the graphical elements representing the data as nearly equal in area as possible; this gives equal visual emphasis to all data values. On bar charts the areas of the elements representing the data - the bars - can be very unequal." His experimental results showed that perception of areas is worse than perception of lengths which is worse than perception of locations.

AdamV

I agree with the general point that bars either side of an average line are not entirely intuitive. One possible solution would be bars from zero with colouring to emphasise those which reach above average performance rather than below average, and with the average line displayed on the chart In the context of the site, this is almost a "target" mark - everything you do needs to raise your successes above this mark to get the most out of the opportunities of the site / service.

@Hadley: In terms of vocabulary, I would distinguish between percent difference (your x1.5) and percentage *points* difference (+50), although I know to many lay readers (ie 90% of the audience of users of that site) this would not necessarily stand out.

My gut feeling would be again that a clear indication of the average line would help with identifying the first type. The average would be at value=1, so "doing X gives you twice as much chance of success as the average" would be fairly clear. To give some sense of absolute percentages, either the values themselves or the +/- deviation from average could be shown as data labels on the bars (although this adds some clutter and possibly ambiguity versus tha axis values).

Hadley Wickham

Kaiser: That's a nice principle, but it doesn't sound like it's backed up with much evidence ;) And I think case, don't you want to draw the eye to the strategies that are most different to the average?

Adam: yes, that's the distinction made in German. But it's certainly not standard in English usage.

generic zithromax

If I were plotting this data, and if I think over/under versus the average is the right metric, I'd use the same plot (sideways) but align the axis labels to depict % over/under the average.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing analytics and data visualization expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Good Books

Keep in Touch

follow me on Twitter

Residues