Use this chart at your own peril

Sep 04, 2013

On Twitter, Joe D. disliked the following chart on the Information is Beautiful blog:

***

The chart carries a long list of flaws.

The column labeled "%" is probably the most jarring. The meaning of these numbers changes with the color. When pink, they give the proportion of females; when blue, the proportion of males. As the stated purpose of the chart is to explore the male-female balance at different websites, it is a bad decision to fold two dimensions into one. While you're thinking about what I just said, what do you think the percentages in gray mean? Your guess is as good as mine.

Now, I appreciate that the designer uses a margin of error (implicitly), and separated these three sites as representing "equality", even though only one of them has the exact 50/50 split.

Wait, for Orkut (second row), it's 51 percent female, and for Foursquare, it's 52 percent male. The gender is coded in the figurines. You can check that with your magnifying glass.

It gets better.

The list of websites is ordered by increasing polarity but only within the three sections. Logically, the three "equality" sites should sit between the "matriarchy" and the "patriarchy".  Pinterest and Reddit, the two most polarized sites, should stand on the edges. On the diagram shown right, I simulated a reader who wants to scan through the list of websites from the most female-oriented (Pinterest) to the most male-oriented (Reddit). It's quite the obstacle course.

Let's get to Joe D.'s issue with the chart. How many people does each figurine represent? It's quite a mouthful. Each figurine represents one percent of the unique visitors at the specific website but only in excess of fifty-percent. In effect, the Facebook figurine represents a huge number of people compared to the figurine of a less popular website like tagged. The designer did not explain the inclusion criteria for websites.

If you didn't get that definition, just ignore the figurines and think of this chart as a bar chart in which the bars start at 50 percent (rather than zero as it should). A standard population pyramid appears to do a better job - just add bars to the left of the diagram and properly align the male and female sections.

***

As I said before, read the fine print.

Here's the fine print:

If I am not mistaken, the designer applied the gender proportions to the traffic totals to obtain the rightmost column, labeled "million more monthly female or male visitors". The trouble is one number pertains to U.S. visitors while the other pertains to worldwide traffic. By multiplying them, the designer makes an assumption: that gender ratio is equivalent inside and outside the U.S., for every website.

Just to give you a sense of scale, according to this chart, Facebook has an excess of 155 million female visitors per month. According to Comscore, the key provider of such data, Facebook has about 145 million total U.S. visitors in June, 2013. It's not a small deal to mix up the geographies.

This example illustrates what I call "use at your own peril". It's like the surgeon's warning in restaurants in the U.S.: we warn you that drinking alcohol while pregnant could lead to birth defects, but you are free to do whatever you want with this information.

***

As of this writing, the original chart has thousands of Facebook likes, hundreds of shares on Linkedin and Pinterest, etc.

It appears that a lot of people are enjoying the chart more than Joe and I do.

***

Finally, here is a sketch of how I would plot this type of data. (U.S. traffic data from Comscore, various months of 2012, where I can find them. Comscore is a fee-based service so it is not easy to find data for the smaller sites unless you have a subscription.)

Excellent post. What jars me is to see an icon of a male or female represent percents rather than a number of males or females.

Thank you. I've disliked this chart for a while, and you've found me a few new reasons to do so. Your redo is much better, but I have a question about the horizontal scale. It looks like Pinterest is more heavily female than Digg is male, but you wouldn't know that without doing some mental arithmetic. Do you see this as an issue?

Yeah, I would be inclined to make the horizontal axis something like the percentage of males, that way it’s balanced on both sides.

Otherwise, I like the final layout.

On the horizontal axis on your redo, one might argue for a log scale in this case. I like the semantics - it means that the distance between 1 and 2 is the same as 1/2 and 1, which is "correct" in the sense that 1/2 is as different from 1 as 1 is from 2 if the numbers are ratios.

However, log axes are usually avoided because they're nonintuitive to most readers. But there's some evidence people aren't so bad at exponentials after all:
http://web.mit.edu/newsoffice/2012/thinking-logarithmically-1005.html

Greg's idea of using % men is pretty good too - if I were faced with choosing between % male and log of male/female ratio (and not concerned about scaring readers with a "log scale") then I thin I'd base my decision on whether I thought it was more important how many women there were for every man on the site (ratio) vs. how close one was to exclusively male or female visitors (percent).

The difference between these is clear if you consider two comparisons, one between two sites, one of which is 50/50 and the other is 49/51, and the other comparison between two sites, one of which is 98/2 and the other 99/1. If you used % male, there would be little difference apparent in either case, but if you use log of ratio then the difference would be much greater in the second case.

Final thought - log of ratio beats either ratio or %male in that it doesn't emphasize either gender.

the fine print on the left is as good an indicator that something is wrong is the fine print on the right :)

Back when I first saw this I actually created a graph along the same idea as the original (blue for male / pink for female), but instead of normalizing by percent I made each bar encode the total number of users. Reddit was all abuzz about how many males there were on the site, but when you see the data all together you can barely even see the Reddit bar compared to the bigger sites. If I remember correctly, Facebook was bigger than the next 3-5 *combined*.

I never ended up getting it 100% completed, but it was a whole lot more "beautiful" (and certainly a whole lot more "information") than the original.

The comments to this entry are closed.