Revisiting the home run data
Go to my other blog now

The top dog among jealous dogs

Is data visualization worth paying for? In some quarters, this may be a controversial question.

If you are having doubts, just look at some examples of great visualization. This week, the NYT team brings us a wonderful example. The story is about whether dogs feel jealousy. Researchers have dog owners play with (a) a stuffed toy shaped like a dog (b) a Jack-o-lantern and (c) a book; and they measured several behavior that are suggestive of jealousy, such as barking or pushing/touching the owner. 

This is how the researchers presented their findings in PLOS:

And this is how the same chart showed up in NYT:


Same data. Same grouped column format. Completely different effect on the readers.

Let's see what the NYT team did to the original, roughly in order of impact:

  • Added a line above the legend, explaining that the colors represent different experimental conditions
  • Re-ordered the behavior by their average prevalence from left to right
  • Added little cartoons to make the chart more fun to look at
  • Added colors and removed moire patterns (a Tufte pet peeve)
  • Changed the vertical scale from 0 to 1 (scientific) to 0-100
  • Reduced the number of tick marks on the vertical scale (this is smart because the researchers observed only about 30 dogs so only very large differences are of practical value)
  • Clarified certain category details, e.g. Snapping became "bite or snap at object"
  • Removed technical details of p-values, not important to NYT readers


Even simple charts illustrating simple data can be done well or done poorly.



Sumit Rahman

I agree that full details of p-values are not required for casual NYT readers, but would argue that something about statistical significance needs to be retained. I'd consider having the little cartoons in a light grey for those cases where there is insufficient evidence of a difference. Of course this would need explaining somewhere.


Sumit: Thanks for bringing this up. I didn't want to clutter up the original post with a comment on statistical significance. The post focuses only on the V corner of the Trifecta Checkup. The D corner is certainly worth investigating. The study is as usual tiny (I think n is 20 or 30), and nonrandom; however, the p-values are shockingly small because the signals are huge. I wouldn't trust the study unless it is replicable by other groups, with larger sample sizes and an improved sample selection.

Just focusing on the Visual representation of the p-values for the moment. I see this as a tradeoff I'm unwilling to make. There is a price to pay for putting an additional detail onto the chart. If this were a two-treatment experiment, then I'd agree with your elegant solution of using grayscale on the dog icons. However, with a three-way analysis, there are three possible comparisons, meaning there are 2^3=8 possible combinations of statistical significance for each set of bars. The increase in complexity is not worth it.


Sumit: After I wrote that, I'm thinking of Gelman's compromise. A good solution would be to hide the statistical significance information behind a mouseover/clickthrough.


P-values are worthless. Probability is not relative frequency. Just show the data that you found - for these 30 dogs, here's what they did. Here's what these dogs were like (age, breed, temperament). Based on this, we believe that dogs recognize the stuffed dogs as dog-like objects. But unless we put numbers to our belief, we haven't done science! Therefore, here are some useless numbers.

The comments to this entry are closed.