« June 2010 | Main | August 2010 »

The cross-hairs of religions

Long-time reader Nick B. found this attractive flow chart.


The chart was produced by the Internet Monk blog. The data was culled from this report (PDF) by the Pew Forum.

The cross-hairs trumpet excitement but the reader is left without much. One could tell that the unaffiliated proportion (red) has more than doubled, mostly at the expense of Catholics (green); that most religions retain the vast majority of their faithful (at least by internal proportions); and that people of one or another faith  move to one or another faith.

Yet, any of these high-level insights do not require a chart that contains data on movement between each pair of religions.

One smart thing about this chart is the inclusion of "unaffiliated / no religion", which completes the picture; otherwise, some previously faithful people would drop off the chart (literally).

The other smart thing is its self-sufficiency: none of the data is printed on the chart, and I doubt readers miss them.


Here, I attempt an alternative, which is a variant of the Web of Debt chart discussed here.


Note the economy of colors, lines, etc. I have chosen to use the number of people with a particular childhood faith as the base for all the percentages; other bases can be selected. For example, the unaffiliated has grown by 144% of the childhood base, with about half of that growth coming from previous Protestants; meanwhile, an exodus of Catholics has occurred. (PS. the data for other faiths being incomplete in the aforementioned report, I made up some of the data so as to finish the chart.)

If the line thickness is made proportional to the percentages, that would eliminate the need to have all those numbers on the chart.

Weekend reading

Lots of ideas from readers have been gathering dust in my mailbox. Here are a bunch of links, with a few comments of mine.


Jetistics_web_162 This first link I'm not sure what to make of. I think the architects and graphic designers amongst you may be able to make sense of it. Not me. It came with this description: "dr. dr. crash and dr. trash of m-a-u-s-e-r
analyzed worlds most junk magazines and visualized their data." For the intrepid (and I claim no liability):

    "Jetistics: The Analysis of Junk. The Junk of Analysis?"


Freshbooks_econ This is yet another instance of the trend of infographics infiltrating PR releases.

This is yet another example of a map adding little or no value to the data. The presence of geographic data is not an excuse to give a lesson on maps.

It would be one thing if the geographic location helps the readers understand the data but in most such charts, the map merely says "Reader, I presume you are map illiterate, so let me tell you  South Africa is at the southern tip of the African continent..."

Also notice that the bar charts are sorted by average size of invoices, which is definitely less meaningful than total amount invoiced. This, I suspect, is the failure to ask the pertinent question, which is at the top of the Trifecta checkup.



Gmcrops"19 Must-See Biotech Infographics", according to Kelly Davis of the BioBlogging Project.

  #2 on this list is a chart (rather old data) on GM food, an issue of concern to me. In the Trifecta checkup, this addresses an important question, and displays very relevant data but uses a poor chart... too many colors, colors not carrying any meaing, hard-to-read labels.

Of the other links, these are more interesting: #10, #12, #17, #19, #8, #9.




Yellow, green and polka dots

Reader Joran recalled our feature of Tour de France bumps charts, made then by Kraig, and he decided to make his own for this year's tour. (He typically blogs about Nordic skiing.)

Here are some highlights:

Tdf2010a You'd notice a similar pattern in 2010 as in 2007. The yellow jersey pretty much stays in the front of the pack throughout... the green jersey (sprints) eventually fades away while the polka dots jersey (mountains) improves as the tour continues.

From the design perspective, one decision concerns whether the colored lines track the jersey or track the current owner of the jersey. Over the course of the tour, jersey change owners, possibly multiple times. What to do?




  Notice that the top of the chart slopes downwards, and that is due to withdrawals of riders during the course of the race.

In the second chart, Joran brings this out by tracking each withdrawn rider until the stage they dropped out, and we can see their then ranks when they faltered.

This shows good use of foreground/background to bring out aspects of the data. In the original post, when you mouse on the red dots, a label appears showing the name of the rider.



 In this next chart, a small multiples format is adopted, with the riders from each team plotted together and each team in a separate plot.  This allows us to see the relative performance easily. Joran tried using one plot, and many colors -- and not surprisingly, discovered that the resulting chart is unreadable. The small multiples format is a solution to this problem.

As someone not too familiar with the race, I find the high variance of the ranking within each team to be unexpected. Can't explain why this would be. In particular, even when a team (Saxobank) has a highly ranked cyclist, it's interesting that the other members of the team are much lower ranked. I thought that team members try to cluster together and protect the team leader. Well, you may be able to make more sense out of this than I can.

I think these charts are ranked alphabetically by the name of the team -- I'd order them by the rank of the leading cyclist of each team.


Another improvement is to label the stages as Mountain vs. Sprint. This can be done by coloring the column for the respective stage... sort of like those economic charts where they color the periods of recession. This helps explain what we are seeing, why some riders achieve drastic improvements (or reductions) in ranks over some stages.


What is clear is that having domain knowledge is an important asset to making good charts. Research is key. This is something Joran also realized, and it's useful to read his commentary about the issues of interpreting the data, being able to recognize typos, etc.

Cleanup job for dirty oil

Note: Posting will be slow in the next few weeks due to holidays.

Vado in vacanza a Roma presto. Se abita alla citta', mi manda una email per favore. Forse ci possiamo incontrare.


Reader Daniel L. submitted this chart a while back, and it's an instructive one.

Ct_oilco This Chicago Tribune chart accompanies an analysis by Greentopia of the "green-ness" of 10 oil companies. The company concocted a ranking based on the composite of 6 factors (e.g. emissions, efficiency).

The bar charts have a couple of unusual, but unconvincing, features:

1. Typically, in a bar chart, longer is better but when bars are used to depict ranks, shorter is better. In charting, it is usually safer to satisfy expectations, or risk being misinterpreted.

2. For rank n, the length of the bar is (n+2). The chart designer just decided that the piece including the actual rank should be thrice the size of the other highlighted pieces. There are a total of 12 strips in each bar.

Why is it a bad thing to add 2 units to every bar? Consider rank 1 vs rank 3. If n = n, then the ratio of bar lengths is 3 to 1. If n = n+2, then ratio is 5 to 3, and not the same! Thus, once the extra unit is added to each bar, the comparative lengths mean something different.


One thing is left unexplained: how is the overall ranking derived from the factor ranking? Hess, with three #1 ranks, would seem to be in contention for overall #1.


Daniel thinks a bumps chart would work nicely, and it certainly would for any kind of ranking data. A slight variation of the Tribune chart would also work nicely... think of the bar as consisting of ten little lights. For each rank, the appropriate light is switched on.  (This is in essence a dot plot.)

In both these variations, the charts are self-sufficient -- there is no need to print all the ranks on the chart as shown above.


Daniel also commented that it is difficult to incorporate the stock price data onto a bumps chart. Why bother? What is the point of including the stock price data anyway? If it is included, readers have to be given tools to interpret such data, in particular, some explanation ought to be provided for large jumps or slips. In addition, the scales should be tailored to allow comparisons of relative value, rather than sticking to equal scales for each stock (Dona Wong covers this topic well in her book.)