Some chart types are not scalable
Mar 21, 2014
Peter Cock sent this Venn diagram to me via twitter. (Original from this paper.)
For someone who doesn't know genetics, it is very hard to make sense of this chart. It seems like there are five characteristics that each unit of analysis can have (listed on the left column) and each unit possesses one or more of these characteristics.
There is one glaring problem with this visual display. The area of each subset is not proportional to the count it represents. Look at the two numbers in the middle of the chart, each accounting for a large chunk of the area of the green tree. One side says 5,724 while the other say 13 even though both sides have the same areas.
In this respect, Venn diagrams are like maps. The area of a country or state on a map is not related to the data being plotted (unless it's a cartogram).
If you know how to interpret the data, please leave a comment. I'm guessing some kind of heatmap will work well with this data.
The tree shaped Venn Diagram is likely inspired by another infamous genome paper figure, the six-set Venn Diagram in the banana genome paper in Nature, DOI 10.1038/nature11241 (Figure 4).
That figure managed to keep one unique region for each set combination, which is a subtle complication in the pine tree version. Notice how many of the numbers are repeated? However there are at least two typos (341 vs 79, and 27 vs 38) which I have emailed the authors about and will hopefully be fixed for the final PDF of the article, see this annotated image.
Posted by: Peter Cock | Mar 21, 2014 at 09:46 AM
Are Venn diagrams ever truly useful for data display?
One this complex certainly never could be.
Most that I have seen make no attempt to make areas match values in any sense, and exist purely to show that items do in fact overlap.
Surely there are better ways to show complex relationships...
Posted by: jlbriggs | Mar 21, 2014 at 12:59 PM
Venn diagrams show in/out relationships. Are they inside or outside the set? So a single circle Venn diagram shows 2 possibilities, a two-circle four, a three-circle eight... and a five-set Venn diagram counts the cases that are in or out of one of thirty two areas. Venn diagrams can always be displayed as tables instead, and I'd suggest two (2x2)x(2x2) tables for this one. Obviously a table is two-dimensional, so you get the extra dimensions by nesting or, as in the biggest distinction in my suggested scheme, multiplicity.
You could enter the numbers into an Excel grid and print it out. It just wouldn't look as cool as the green tree diagram above.
Posted by: derek | Mar 21, 2014 at 04:26 PM
Just after I hit "post", I remembered we've been here before...
(man, 2007? what happened to the time?)
Posted by: derek | Mar 21, 2014 at 04:32 PM
The groups listed on the left are distinct groupings of plants. A given plant can only fall in one of them. I assume you know what conifers and mosses are, roughly, at least. Monocots includes plants such as Maize, while dicots is the group that includes flowers such as magnolia.
I'm a bit confused why they're still using dicots, to be honest, I thought dicots was no longer used as a grouping because it turned out to be polyphyletic (that is, simplifying a bit, that the group evolved from multiple ancestors not one).
The numbers then show the number of gene families that are common to different combinations of these major plant groupings. Such diagrams are actually pretty common in genetics but I've always found they shed as much heat as light and this one's rather cute decision to use a tree shaped cut out doesn't make it any easier to read.
Posted by: Jack | Mar 22, 2014 at 08:20 AM
Oh, and on the scaling thing; I don't see how it would be possible to display this data to scale - you'd need to be able to show 1 clearly and 5724 clearly.
Posted by: Jack | Mar 22, 2014 at 08:21 AM
Jack, I think Kaiser means that Venn diagrams start out cute with two or three sets, but don't scale up to five sets well. Venn diagrams almost never attempt to show the area of the sets proportional to their populations.
Posted by: derek | Mar 22, 2014 at 10:23 AM
It's impossible for me to be sure because I can't see the whole paper, but I think they took the sequences of the loblolly pine genome, and asked "is this sequence common to one of the other pines named in the study, yes/no, at least one of the mosses yes/no, the basal example yes/no, and so on". The result is a 2x2x2x2x2 matrix of numbers of sequences fitting the criteria.
Posted by: derek | Mar 22, 2014 at 10:28 AM
@derek - I think everyone understands *what* a Venn Diagram is intended to do.
The question is, does it do so in any way that's effective or useful?
With this level of complexity, I think the answer is a very clear "No!"
Yes, if you spend long enough studying it, you can come away with some information.
But the same would be true if the data were written in pictograms in the dirt with stick.
With 2 or three sets, it may be superficially useful as a visual aid, but still not in a way that provides any depth of understanding the data.
As far as looking cool, I'd say this one looks like a group of children took turns with a spirograph, one on top of the other :)
Posted by: jlbriggs | Mar 24, 2014 at 01:55 PM
"Cool" was meant to be sarcastic.
Posted by: derek | Mar 26, 2014 at 07:10 AM