Cousin misfit
Feb 25, 2010
Stef, who had a hand in the inkblot charts that many loved, sent in the following chart, with the note that he hasn't seen these line/area charts before.
This chart is interesting indeed. The objective of the chart is to compare the state of drinking water in different regions of the developing world. It tries to emphasize the amount of improvement attained between 1990 and 2006.
I can't quite figure out how the regions are ordered. It's not by any of proportions depicted in the chart, nor position on the map.
Next, with the areas catching much attention, I wanted to figure out what the areas mean.
To help in this exercise, I computed the key piece of information, i.e. the increase or decrease in proportion of each water source, and placed on in each piece of area, as shown below.
Based on this evidence, one has to conclude that the area has nothing to do with the change in proportions over time. The brown areas (unimproved sources) are negative changes while the blue and light blue areas are positive changes. Negative area is not a visually depictable concept, unfortunately.
Also, note the dark blue areas of Latin America vs Western Asia. The Western Asia one is a bit larger than the Latin America but the change in proportions is exactly the reverse, 7% against 23%.
Is this a new type of chart? It took me a few days to figure it out.
How is the following chart related to the above chart?
The original chart is a cousin misfit of the above chart, as we can see below.
The key piece of data is embedded in the slopes of the connecting lines, and this cousin of the column chart with connecting lines draws our attention away from those lines and to the areas. The colored areas are in no way proportional to the slopes of the connecting lines and so the information has been distorted.
Nice-looking chart but needs rethinking.
PS. Some commentators seem to think that I suggested that the paired column charts would be a better alternative than the original. No -- I am using charts to analyze charts. An improved chart would be like the following, in which the areas are de-emphasized in favor of the lines. (Please imagine the vertical axis.)
Hmm, not really sure if I understand this right.
"The brown areas (unimproved sources) are negative changes while the blue and light blue areas are positive changes. Negative area is not a visually depictable concept, unfortunately." In strict mathematical sense, it's indeed a negative change (first value 65, second value 54 = -11). But in my interpretation it's a positive one, as there were in Sub-Saharan Africa 1990 65%, but in 2006 "only" 54% of the population without access to safe drinking water, so, an imrovement of the situation. Having "less" brown area towards 2006 makes perfect sense to me.
"Also, note the dark blue areas of Latin America vs Western Asia. The Western Asia one is a bit larger than the Latin America but the change in proportions is exactly the reverse, 7% against 23%." Change in proportion, or change in percentages? Does it make a difference? Again, I fail to see faulsness here.
As I am developing a series of graphics for a presentation based on the MDGs (Millenium Development Goals) I got inspired by that graph, but restrained from using the area. You can find a perhaps easier understandable version here, which, in principal, you could consider as a line chart. It does however not add up to 100, as in the graphs below, that's why using area didn't make sense anyway.
Posted by: Stef | Feb 25, 2010 at 01:23 AM
Stef: not surprisingly, I like the Bumps chart much better. You should go with that type.
Posted by: Kaiser | Feb 25, 2010 at 01:43 AM
The columns are in fact ordered: decreasing by % unimproved water source in 2006.
Posted by: Tessa | Feb 25, 2010 at 02:50 AM
Maybe not the best chart but I think it meets the intent: shows a) In 2006, which regions have the highest percentage access to improved drinking water (the order) and b) where the improvement came from, i.e. piped or improved drinking sources.
Clearly Sub-Saharan Africa was the worst in 1990 and remains so in 2006. You can also see that Eastern Asia went from 2nd worst to best in the 16 year time frame, predominantly from piped drinking water.
What's also interesting is that Northern Africa made significant improvement in Piped drinking water but that improvement seems to be mainly for people that already had access to improved drinking water from another source.
Posted by: Jim | Feb 25, 2010 at 03:18 AM
I redid the chart as a simple line chart. See here .
I think it better shows the interesting points Jim talks about.
Posted by: Tessa | Feb 25, 2010 at 06:53 AM
I hate to say it but I disagree - I like the original quite a bit. I think separating the bars somewhat would improve its readabilty slightly, but otherwise it gives a sense of the relative proportions of sources of water over time, the slope is actually effective (for me) to indicate rate of change. It took me 10 seconds to figure out, far less than the line chart and more elegant than the two-column version.
Posted by: gary | Feb 25, 2010 at 10:19 AM
I think the original chart is excellent. It effectively shows two different and relevant dimensions of the data. - and for me it was very intuitive. I would like to do such a chart with R and ggplot2 :-)
Posted by: Andreas | Feb 26, 2010 at 07:13 AM
Tessa - well spotted on the sequencing. It's not at all obvious. I think the basic chart is OK. I'm not sure about the large headline at the top "Access to improved drinking water sources is predominantly a rural problem." Why is it there ? Is it a conclusion that we are invited to derive from inspecting the chart ? In which case I can't see how one could derive that conclusion from this data. Or is it a given, that we are meant to just accept ?
Also, does the chart only show data for "rural" areas within each region, or does the data cover the whole region, whether rural, urban or other ? The article doesn't seem to include a link to the source - perhaps there is more explanation of this in the original source.
Posted by: Gerald Higgins | Feb 26, 2010 at 11:06 AM
I also like the original chart better and I don't think it's a new type of chart, it's just an stacked area chart (even if there are only two points defined in the continuous time axis, I don't think a couple of stacked bars would be easier to understand).
Andreas, this type of chart can be easily be generated with ggplot:
Posted by: carlitos | Feb 27, 2010 at 04:09 PM
So your only issue with the original chart is that the three regions defined by the two lines (and the 0%,100% limits) are painted? I'd say the colors are useful: they make explicit that we're looking at the partition of the rural households in three groups: brown if water is quality is bad and blue (bluer) when the water quality is good (better).
The meaning of the area is obvious (to me, at least): the (time-averaged) "proportion of rural households using piped water, etc". Of course the area has nothing to do with the change in proportions over time. That change is given by the time-dependent frontier between regions; the same lines that appear in the chart you propose, but in the original chart this information appears surrounded by a context that makes the meaning clearer.
Posted by: carlitos | Mar 01, 2010 at 03:59 PM
Carlitos: if the point of the chart is to illustrate the change in proportions over time, then coloring the areas draws attention to the wrong metric, which you pointed out is a time-averaged proportion. A time-averaged proportion by the way is not a function of time, and this chart misleads us to think it is.
In fact, if the change over time is the issue, the entire time series should be plotted not just the end points.
Posted by: Kaiser | Mar 01, 2010 at 07:04 PM
Carlitos: thanks for the code.
Posted by: Kaiser | Mar 01, 2010 at 07:05 PM