The tree shaped Venn Diagram is likely inspired by another infamous genome paper figure, the six-set Venn Diagram in the banana genome paper in Nature, DOI 10.1038/nature11241 (Figure 4).

That figure managed to keep one unique region for each set combination, which is a subtle complication in the pine tree version. Notice how many of the numbers are repeated? However there are at least two typos (341 vs 79, and 27 vs 38) which I have emailed the authors about and will hopefully be fixed for the final PDF of the article, see this annotated image.

Are Venn diagrams ever truly useful for data display?
One this complex certainly never could be.

Most that I have seen make no attempt to make areas match values in any sense, and exist purely to show that items do in fact overlap.

Surely there are better ways to show complex relationships...

Venn diagrams show in/out relationships. Are they inside or outside the set? So a single circle Venn diagram shows 2 possibilities, a two-circle four, a three-circle eight... and a five-set Venn diagram counts the cases that are in or out of one of thirty two areas. Venn diagrams can always be displayed as tables instead, and I'd suggest two (2x2)x(2x2) tables for this one. Obviously a table is two-dimensional, so you get the extra dimensions by nesting or, as in the biggest distinction in my suggested scheme, multiplicity.

You could enter the numbers into an Excel grid and print it out. It just wouldn't look as cool as the green tree diagram above.

Just after I hit "post", I remembered we've been here before...

(man, 2007? what happened to the time?)

The groups listed on the left are distinct groupings of plants. A given plant can only fall in one of them. I assume you know what conifers and mosses are, roughly, at least. Monocots includes plants such as Maize, while dicots is the group that includes flowers such as magnolia.

I'm a bit confused why they're still using dicots, to be honest, I thought dicots was no longer used as a grouping because it turned out to be polyphyletic (that is, simplifying a bit, that the group evolved from multiple ancestors not one).

The numbers then show the number of gene families that are common to different combinations of these major plant groupings. Such diagrams are actually pretty common in genetics but I've always found they shed as much heat as light and this one's rather cute decision to use a tree shaped cut out doesn't make it any easier to read.

Oh, and on the scaling thing; I don't see how it would be possible to display this data to scale - you'd need to be able to show 1 clearly and 5724 clearly.

Jack, I think Kaiser means that Venn diagrams start out cute with two or three sets, but don't scale up to five sets well. Venn diagrams almost never attempt to show the area of the sets proportional to their populations.

It's impossible for me to be sure because I can't see the whole paper, but I think they took the sequences of the loblolly pine genome, and asked "is this sequence common to one of the other pines named in the study, yes/no, at least one of the mosses yes/no, the basal example yes/no, and so on". The result is a 2x2x2x2x2 matrix of numbers of sequences fitting the criteria.

@derek - I think everyone understands *what* a Venn Diagram is intended to do.

The question is, does it do so in any way that's effective or useful?

With this level of complexity, I think the answer is a very clear "No!"

Yes, if you spend long enough studying it, you can come away with some information.

But the same would be true if the data were written in pictograms in the dirt with stick.

With 2 or three sets, it may be superficially useful as a visual aid, but still not in a way that provides any depth of understanding the data.

As far as looking cool, I'd say this one looks like a group of children took turns with a spirograph, one on top of the other :)

"Cool" was meant to be sarcastic.

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

(Name is required. Email address will not be displayed with the comment.)

NEW BOOTCAMP

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

Book Blog

Graphics design by Amanda Lee