The top dog among jealous dogs
Jul 25, 2014
Is data visualization worth paying for? In some quarters, this may be a controversial question.
If you are having doubts, just look at some examples of great visualization. This week, the NYT team brings us a wonderful example. The story is about whether dogs feel jealousy. Researchers have dog owners play with (a) a stuffed toy shaped like a dog (b) a Jack-o-lantern and (c) a book; and they measured several behavior that are suggestive of jealousy, such as barking or pushing/touching the owner.
This is how the researchers presented their findings in PLOS:
And this is how the same chart showed up in NYT:
Same data. Same grouped column format. Completely different effect on the readers.
Let's see what the NYT team did to the original, roughly in order of impact:
- Added a line above the legend, explaining that the colors represent different experimental conditions
- Re-ordered the behavior by their average prevalence from left to right
- Added little cartoons to make the chart more fun to look at
- Added colors and removed moire patterns (a Tufte pet peeve)
- Changed the vertical scale from 0 to 1 (scientific) to 0-100
- Reduced the number of tick marks on the vertical scale (this is smart because the researchers observed only about 30 dogs so only very large differences are of practical value)
- Clarified certain category details, e.g. Snapping became "bite or snap at object"
- Removed technical details of p-values, not important to NYT readers
Even simple charts illustrating simple data can be done well or done poorly.
I agree that full details of p-values are not required for casual NYT readers, but would argue that something about statistical significance needs to be retained. I'd consider having the little cartoons in a light grey for those cases where there is insufficient evidence of a difference. Of course this would need explaining somewhere.
Posted by: Sumit Rahman | Jul 29, 2014 at 05:36 AM
Sumit: Thanks for bringing this up. I didn't want to clutter up the original post with a comment on statistical significance. The post focuses only on the V corner of the Trifecta Checkup. The D corner is certainly worth investigating. The study is as usual tiny (I think n is 20 or 30), and nonrandom; however, the p-values are shockingly small because the signals are huge. I wouldn't trust the study unless it is replicable by other groups, with larger sample sizes and an improved sample selection.
Just focusing on the Visual representation of the p-values for the moment. I see this as a tradeoff I'm unwilling to make. There is a price to pay for putting an additional detail onto the chart. If this were a two-treatment experiment, then I'd agree with your elegant solution of using grayscale on the dog icons. However, with a three-way analysis, there are three possible comparisons, meaning there are 2^3=8 possible combinations of statistical significance for each set of bars. The increase in complexity is not worth it.
Posted by: junkcharts | Jul 29, 2014 at 09:40 AM
Sumit: After I wrote that, I'm thinking of Gelman's compromise. A good solution would be to hide the statistical significance information behind a mouseover/clickthrough.
Posted by: junkcharts | Jul 29, 2014 at 09:41 AM
P-values are worthless. Probability is not relative frequency. Just show the data that you found - for these 30 dogs, here's what they did. Here's what these dogs were like (age, breed, temperament). Based on this, we believe that dogs recognize the stuffed dogs as dog-like objects. But unless we put numbers to our belief, we haven't done science! Therefore, here are some useless numbers.
Posted by: Nate | Oct 08, 2014 at 12:45 PM