Hmm, not really sure if I understand this right.

"The brown areas (unimproved sources) are negative changes while the blue and light blue areas are positive changes. Negative area is not a visually depictable concept, unfortunately." In strict mathematical sense, it's indeed a negative change (first value 65, second value 54 = -11). But in my interpretation it's a positive one, as there were in Sub-Saharan Africa 1990 65%, but in 2006 "only" 54% of the population without access to safe drinking water, so, an imrovement of the situation. Having "less" brown area towards 2006 makes perfect sense to me.

"Also, note the dark blue areas of Latin America vs Western Asia. The Western Asia one is a bit larger than the Latin America but the change in proportions is exactly the reverse, 7% against 23%." Change in proportion, or change in percentages? Does it make a difference? Again, I fail to see faulsness here.

As I am developing a series of graphics for a presentation based on the MDGs (Millenium Development Goals) I got inspired by that graph, but restrained from using the area. You can find a perhaps easier understandable version here, which, in principal, you could consider as a line chart. It does however not add up to 100, as in the graphs below, that's why using area didn't make sense anyway.

Stef: not surprisingly, I like the Bumps chart much better. You should go with that type.

The columns are in fact ordered: decreasing by % unimproved water source in 2006.

Maybe not the best chart but I think it meets the intent: shows a) In 2006, which regions have the highest percentage access to improved drinking water (the order) and b) where the improvement came from, i.e. piped or improved drinking sources.

Clearly Sub-Saharan Africa was the worst in 1990 and remains so in 2006. You can also see that Eastern Asia went from 2nd worst to best in the 16 year time frame, predominantly from piped drinking water.

What's also interesting is that Northern Africa made significant improvement in Piped drinking water but that improvement seems to be mainly for people that already had access to improved drinking water from another source.

I redid the chart as a simple line chart. See here .
I think it better shows the interesting points Jim talks about.

I hate to say it but I disagree - I like the original quite a bit. I think separating the bars somewhat would improve its readabilty slightly, but otherwise it gives a sense of the relative proportions of sources of water over time, the slope is actually effective (for me) to indicate rate of change. It took me 10 seconds to figure out, far less than the line chart and more elegant than the two-column version.

I think the original chart is excellent. It effectively shows two different and relevant dimensions of the data. - and for me it was very intuitive. I would like to do such a chart with R and ggplot2 :-)

Tessa - well spotted on the sequencing. It's not at all obvious. I think the basic chart is OK. I'm not sure about the large headline at the top "Access to improved drinking water sources is predominantly a rural problem." Why is it there ? Is it a conclusion that we are invited to derive from inspecting the chart ? In which case I can't see how one could derive that conclusion from this data. Or is it a given, that we are meant to just accept ?

Also, does the chart only show data for "rural" areas within each region, or does the data cover the whole region, whether rural, urban or other ? The article doesn't seem to include a link to the source - perhaps there is more explanation of this in the original source.

I also like the original chart better and I don't think it's a new type of chart, it's just an stacked area chart (even if there are only two points defined in the continuous time axis, I don't think a couple of stacked bars would be easier to understand).

Andreas, this type of chart can be easily be generated with ggplot:

```library(ggplot2)

regions=c("Sub-Saharan Africa","Latin America",
"Western Asia","South-Eastern Asia","Southern Asia",
"Northern Africa","Eastern Asia")

years=c(1990,2006)

sources=c("Piped","Other improved","Unimproved")

water=data.frame(region=rep(regions,each=6),
year=rep(rep(years,each=3),7),source=rep(sources,14),
share=c(4,31,65,5,41,54,25,36,39,48,25,27,
50,20,30,57,23,20,4,60,36,14,67,19,
8,60,32,10,74,16,34,48,18,63,24,13,
37,18,45,72,19,9)) # not 62,19,9 !!!

water\$source=ordered(water\$source,levels=sources)
water\$region=ordered(water\$region,levels=regions)

ggplot(water,aes(x=year,y=share,fill=source))+geom_area()+
facet_wrap(~region,nrow=1)+scale_y_continuous('')+
scale_x_continuous('',breaks=c(1990,2006),limits=c(1988,2008))+
scale_fill_manual("Water source",c("Piped"="SteelBlue",
"Other improved"="LightSteelBlue","Unimproved"="DarkKhaki"))+
opts(strip.text.x=theme_text(angle=90),
axis.text.x=theme_text(angle=90),
panel.background = theme_blank(),
axis.ticks = theme_blank(),
strip.background = theme_blank())
```

So your only issue with the original chart is that the three regions defined by the two lines (and the 0%,100% limits) are painted? I'd say the colors are useful: they make explicit that we're looking at the partition of the rural households in three groups: brown if water is quality is bad and blue (bluer) when the water quality is good (better).

The meaning of the area is obvious (to me, at least): the (time-averaged) "proportion of rural households using piped water, etc". Of course the area has nothing to do with the change in proportions over time. That change is given by the time-dependent frontier between regions; the same lines that appear in the chart you propose, but in the original chart this information appears surrounded by a context that makes the meaning clearer.

Carlitos: if the point of the chart is to illustrate the change in proportions over time, then coloring the areas draws attention to the wrong metric, which you pointed out is a time-averaged proportion. A time-averaged proportion by the way is not a function of time, and this chart misleads us to think it is.
In fact, if the change over time is the issue, the entire time series should be plotted not just the end points.

Carlitos: thanks for the code.

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

(Name is required. Email address will not be displayed with the comment.)

## NEW BOOTCAMP

See our curriculum, instructors. Apply.
Marketing analytics and data visualization expert. Author and Speaker. Currently at Columbia. See my full bio.

## Book Blog

Graphics design by Amanda Lee