Yellow fever rolling over America
Too much art, not enough science in infographics

Game over, Tetris

This is the image in my head at the moment (link here):


That's when @yawnathan pointed me to the following infographics via twitter:


Here are some issues with the choice of graphics:

  • Not sure how tetris-shaped pieces are better than a standard stacked bar chart, or a line chart
  • Adding a one-liner for each analysis summarizing the key insight is essential, and much more engaging than dry titles like "by gender"
  • Ordering each section of the poster in a sensible way would help bring out the message; maintaining the same order in all four sections has little benefit but adds to the confusion
  • Many of the corporate logos are not popular enough to yield recognition; they do not resemble their company names enough to elicit free association

But the chart also fails to ask the right question. In thinking about "who uses which sites?", it would be much more informative to cut the data in a different way -- tell us among males, what proportion uses Digg v. Stumbleupon v. Facebook, etc. The problem with the current graphic is that it offers no information about scale. For example, Ning may have 1/1000-th of the total traffic compared to Facebook (I made this up) but you wouldn't know since everything is expressed as a proportion of each site's user base.

Besides, what is the objective behind asking the question, who uses which sites? Are readers asked to draw conclusions about the relative viability of the business models of these companies? Is there some significance associated with an elderly skew or female skew?

Finally, the chart hits the trifecta! It also fails from the data collection perspective. While it discloses the source of the data as "Google Ad Planner", it is impossible for readers to make sense of the data. How reliable is this data? Did the income levels come from surveys of users (self-reported and probably biased)? Or from users associated with a specific advertising campaign? Did they come from matching users' IP addresses to Census data? If so, how much actual household-level data are used? Or perhaps a statistical model was built to predict income levels?  Of which period is the data representative? Does that period generalize to other periods? Were there any (or many) missing values? Were these values imputed or set to the average? If a sample was used, how do we know that it is unbiased?


In this form, the infographics poster is nothing more than a done-up data dump.


Feed You can follow this conversation by subscribing to the comment feed for this post.

Rick Wicklin

The lack of standardization seems the biggest problem. I would have preferred to see the data as a "percentage of all users of social media," which would have enabled comparison across different sites (as you point out).

Like you, I wonder why the various sites are displayed horizontally in 3 graphics, but vertically in the Site vs. Age graphic? I critiqued a similar chart of social media usage by age last year:

Jörgen Abrahamsson

There should have been ordering to show the patterns in the data. That is the main problem.

The tetris look is a bad solution but there is a case for using
actual visualization of the data(60%,40%) instead of just the proportionality of those numbers(6:4 or rather 3:2).That is what a standard (stacked or not)bar chart does. When practical I think it is better to show the actual numbers.

I´m not sure you can say that relative numbers are the wrong choice. Relative vs absolute numbers are allways an issue. I find the relative question as interesting.

The comments to this entry are closed.