« October 2012 | Main | December 2012 »

Budding graphics connoisseurs from Down Under

A reader, Stephen M., who's a high school math Information Technology teacher in Australia, assigned the following chart to his class as a Junk Charts style assignment. (link to original here)

Behance_donationWe have seen racetrack charts before (e.g. here or here), and we have dual racetracks here.

Stephen's class identified the following problems with the chart:

- The group agreed this should be better called a data visualisation than an infographic

- The purpose of the 'infographic' seems to be more on the design/form, than the function of conveying an understanding of the data

- There seems to be a bit of an optical illusion with the lower upper circle for the US appearing larger than the upper lower one (we checked, there isn't)

- There are no clear labels to assist. It is an assumption that because in the heading and the figures, population is on top of donations, that the lines are the same. The class agreed that country labels would help to the left of each line start.

- No scale on the lines and where do you measure from/to (especially as the US line is a single line for a proportion of the way

- It's too abstract and the spatial separation of the curves makes comparison difficult.

***

Wow, that's great critique from the 16-year-olds. They are working on ways to re-make this graphic. One good idea is to collapse the two dimensions into one: per-capita donations.

Another issue with this chart is that the countries are sorted in different ways from one chart to the next. It's really difficult to compare one country to another.

It is also instructive to discuss what the key message is in this data. Why those six countries? What kinds of donations are being counted? Do the counting methodology differ by country? How comparable is the data?

Finally, is this art or is this science?

P.S. [12/2/2012] Stephen noted that another deficiency identified by the students is the lack of sourcing. Indeed, where did the data come from? They think it's the CIA Factbook.


The electoral map sans the map

Xan G. has a must-read post comparing different ways of showing the electoral map. See here.

The key learning is something I often point out on this blog: geographical data can have a greater impact when it is unshackled from the map.

Xan pointed to a series of ideas that are improvements upon the map.

Here's an attempt to portray the election night as a horse race. This borrows an idea from the sports world where a baseball game can be portrayed with such a chart.

Xg_election4

I love this sort of presentation. Similar to a baseball game, someone can look at this chart after the fact and experience the ups and downs of an Obama/Romney supporter without actually being there.

Then Xan spoils some of the fun by transforming the above into the following chart, which portrays Obama's win as a rout. All the suspense is gone!

Xg_election5

As Xan explains it, he took Nathan Silver's predictions of "sure wins" and plotted those first. Thus, Obama started the night at almost 200 while Romney started with about 170.

While indeed the fun is gone, this is a more accurate view of this just-concluded election. I was a spoilt sport myself that night as I kept telling my friends that the only reason why Romney seemed close at the start was that the Red States generally have smaller populations, and thus took less time to count their votes. In addition, the Red States also tend to favor Republican candidates by very large margins so that the winner could be called early without counting most of the votes.

***

I have other thoughts on the state of reporting on polls, which I'll cover in a later post.


How to fail three tests in one chart

The November issue of Bloomberg Markets published the following pair of pyramid charts:

Bb_pyramids

This chart fails a number of tests:

Tufte's data-ink ratio test

There are a total of six data points in the entire graphic. A mathematician would say only four data points, since the "no opinion" category is just the remainder. The designer lavishes this tiny data set with a variety of effects: colors, triangles, fonts of different tints, fonts of different sizes, solid and striped backgrounds, and legends, making something that is simple much more complex than necessary. The extra stuff impedes rather than improves understanding. In fact, there were so many parts that the designer even forgot to add little squares on the right panel beside the category labels.

Junk Charts's Self-sufficiency test

The data are encoded in the heights of the pyramids, not the areas. The shapes of the areas are inconsistent, which also makes it impossible to decipher. The way it is set up, one must compare the green, striped triangle with two trapezoids. This is when a designer realizes that he/she must print the data labels onto the chart as well. That's when self-sufficiency is violated. Cover up the data labels, and the graphical elements themselves no longer convey the data to the readers. More posts about self-sufficiency here.

Junk Charts's Trifecta checkup

The juxtaposition of two candidates' positions on two entirely different issues does not yield much insights. One is an economic issue, one is military in nature. Is this a commentary of the general credibility of the candidates? or their credibility on specific issues? or the investors' attitude toward the issues? Once the pertinent question is clarified, then the journalist needs to find the right data to address the question. More posts about the Trifecta checkup here.

Minimum Reporting Requirements for polls

Any pollster who doesn't report the sample size and/or the margin of error is not to be taken seriously. In addition, we should want to know how the sample was selected. What does it mean by "global investors"? Did the journalist randomly sample some investors? Did investors happen to fill out a survey that is served up somehow?

***

The following bar charts, while not innovative, speak louder.

Redo_pyramid1
Redo_pyramid2


Guest blog: Popcorn infographics

Note: This post is by Aleksey Nozdryn-Plotnicki, who blogs at ThinkDataVis.

***

On my way to Crete recently, I was flipping through the in-flight magazine when I stumbled upon this treat. This full-page piece was about Claire Cock-Starkey’s upcoming (at the time) book, Seeing the Bigger Picture.

Thinkdatavis1

The book sells itself as “Global Infographics” and the article says it is “swapping dry words for colourful illustrated visuals”. The baby and the iPhone are pure decoration, but there are also some information graphics here at the top and the bottom which bear a closer look.

Thinkdatavis2

Above we have what at first looks innovative, but is actually a disguised bar chart. That’s fine, but:

  • Bars have been arched, challenging our ability to compare them
  • Outer bars actually have further to go as the radius and therefore circumference increases. So while Japan has the lowest percentage, its bar appears to be equally as long as that of Norway, the largest. In fact, since the values are sorted, for the most part all bars are the same length and size.
  • The legend is far larger than the chart itself, and is what really delivers the information at all. Using that space for a larger chart and labelling the bars directly (like in a usual bar chart) might be better.
  • There is no axis with any ticks or labels
  • The chart has too many categorical colours, so knowing what any colour represents requires looking it up in the legend where the raw data is anyway.
  • Why this circular shape? I suspect it was a clock-face for time, but the decoration, presumably informing our sense of “leisure activity” has removed the clock hands, so the metaphor is weak.
  • Why does the Norway bar go only 90 degrees around? This seems equivalent to not properly scaling the Y-axis on a bar chart and leaving copious empty space above. Maybe this is meant to indicate that even the most leisurely Norwegians only have time for gardening, being a kite, and drinking at a table.
  • Consolation points, however, for taking the time to clearly state what leisure time was defined as in this data.

Thinkdatavis3b

At first this looks more like a traditional bar chart, until you realise that:

  • Larger data is at the top and smaller at the bottom, so the data is tied to the blue lines on the left, rather than the visually-weighty bars on the right. Or maybe the height of the pyramid is meant to be tied to age at marriage?
  • Bars are artificially grouped and forced to be the same length, i.e. Sweden 34.3 and Germany 33.7. This leads to a “lie factor”.
  • In any event the data is so loosely encoded that it can hardly be considered encoded at all. The lines and the data are both sorted.
  • It has a non-zero baseline at roughly 20 or so, a “sin” in bar charts, though you could argue for a non-zero baseline of around 18 for marriage since you would never expect to see values below that

Ultimately, what I think we have here belongs in a genre of its own, perhaps “popcorn infographics”.  At the time of writing the one review on amazon.co.uk reads “Bought this for my 14 yr old - absolutely loves it and showed friends who were also suitably impressed. Thank you” which says a lot,  and not all negative. Perhaps there is room for popcorn infographics in this world or perhaps it’s just junk.

***

Aleksey Nozdryn-Plotnicki an analyst/consultant and data visualisation blogger at ThinkDataVis.com. He is @alekseynp on Twitter.

 


Purists stay clear; many rules to be broken

This chart, from Internet Retailer (March 2012, p. 26), is okay, at least they didn't use pie charts. But it could have been much more effective.

Onlneshp_sm

To make it better, we have to break all the rules:

  • Use lines instead of columns. The following reproduces the right side of the chart above which deals with shopping behavior by income groups. One of the issues with this chart is that the gray partition between the gender section and the income section is too incognito.

Redo_onlineshop1

  • Use rounded percentages instead of two decimal places. If we need to see two decimal places in order to tell two categories apart, then the difference is surely insignificant.
  • Round up group boundaries. Notice that I committed the sin of imprecision by not specifying whether $60,000 belongs to $30-60K or $60-100K. Is that level of precision necessary? Is it worth the ugliness of printing a number like $59,999? If there is information in whether $60,000 belongs to $30-60K or $60-100K, what does that say about your analysis?
  • Dare to throw away information. Look at this final version of the above charts:

Redo_onlineshop2

I have collapsed the five income groups into two. That's because the bottom three income groups are more or less the same in terms of online/offline shopping behavior, and the top two groups are more or less the same. Bear in mind that any analysis of this type has a margin of error so differences of a few percentage points are not worth representing. It is possible that even the 10 percent or so differences between the two remaining income groups are not meaningful--we won't know unless we know the margin of error of their methodology.

(Note that you have to consult the Census to get the relative proportions of each income group in order to average the data shown in the original chart.)

In any case, the key message of the article becomes a lot clearer than in the original chart.

  • Use black and white instead of color. Having simplified the presentation--but retaining all the important information--we no longer need colors.

Start breaking some rules today, and you'll make better charts.