Revisiting the home run data
Go to my other blog now

The top dog among jealous dogs

Is data visualization worth paying for? In some quarters, this may be a controversial question.

If you are having doubts, just look at some examples of great visualization. This week, the NYT team brings us a wonderful example. The story is about whether dogs feel jealousy. Researchers have dog owners play with (a) a stuffed toy shaped like a dog (b) a Jack-o-lantern and (c) a book; and they measured several behavior that are suggestive of jealousy, such as barking or pushing/touching the owner. 

This is how the researchers presented their findings in PLOS:

And this is how the same chart showed up in NYT:


Same data. Same grouped column format. Completely different effect on the readers.

Let's see what the NYT team did to the original, roughly in order of impact:

  • Added a line above the legend, explaining that the colors represent different experimental conditions
  • Re-ordered the behavior by their average prevalence from left to right
  • Added little cartoons to make the chart more fun to look at
  • Added colors and removed moire patterns (a Tufte pet peeve)
  • Changed the vertical scale from 0 to 1 (scientific) to 0-100
  • Reduced the number of tick marks on the vertical scale (this is smart because the researchers observed only about 30 dogs so only very large differences are of practical value)
  • Clarified certain category details, e.g. Snapping became "bite or snap at object"
  • Removed technical details of p-values, not important to NYT readers


Even simple charts illustrating simple data can be done well or done poorly.



Feed You can follow this conversation by subscribing to the comment feed for this post.

Sumit Rahman

I agree that full details of p-values are not required for casual NYT readers, but would argue that something about statistical significance needs to be retained. I'd consider having the little cartoons in a light grey for those cases where there is insufficient evidence of a difference. Of course this would need explaining somewhere.


Sumit: Thanks for bringing this up. I didn't want to clutter up the original post with a comment on statistical significance. The post focuses only on the V corner of the Trifecta Checkup. The D corner is certainly worth investigating. The study is as usual tiny (I think n is 20 or 30), and nonrandom; however, the p-values are shockingly small because the signals are huge. I wouldn't trust the study unless it is replicable by other groups, with larger sample sizes and an improved sample selection.

Just focusing on the Visual representation of the p-values for the moment. I see this as a tradeoff I'm unwilling to make. There is a price to pay for putting an additional detail onto the chart. If this were a two-treatment experiment, then I'd agree with your elegant solution of using grayscale on the dog icons. However, with a three-way analysis, there are three possible comparisons, meaning there are 2^3=8 possible combinations of statistical significance for each set of bars. The increase in complexity is not worth it.


Sumit: After I wrote that, I'm thinking of Gelman's compromise. A good solution would be to hide the statistical significance information behind a mouseover/clickthrough.


P-values are worthless. Probability is not relative frequency. Just show the data that you found - for these 30 dogs, here's what they did. Here's what these dogs were like (age, breed, temperament). Based on this, we believe that dogs recognize the stuffed dogs as dog-like objects. But unless we put numbers to our belief, we haven't done science! Therefore, here are some useless numbers.

The comments to this entry are closed.