Feb 12, 2007
Small multiples can work wonders when data are replicated, as in this case. The chart accompanied an Economist article on pollution levels in several European cities, as indicated by the concentration of nitrogen dioxide and particulates.
In the junkart version, I plotted the data series side by side, rather than one over the other. Further, the order of cities was according to decreasing levels of NO2, which seemed to be the worse pollutant. All gridlines are removed except the 30 line which worked pretty well to separate out the highly polluted cities.
An odd pattern has now surfaced. Namely, there is some degree of negative correlation between the concentration of the two pollutants. Environmental scientists may be able to tell us why.
Reference: "The Big Smoke", Economist, Feb 3 2007.
I think this is a case for exploratory data analysis. I'd like to know that a similar pattern was visible when the graphs were plotted in order of maximum particulate pollution as well as maximum NO2 pollution. And when the minimum and midpoints of each were plotted instead. If only one of the six combinations showed any visible pattern, and the sixth was what was published, I'd cry "Cherry picking!"
Also, scatter graphs would be the medium for displaying correlations, surely? haven't you railed in the past about plotting two time series side by side and claiming a correlation by virtue of the series looking similar when the scales are adjusted? This is just the same thing, but with a category (city) in place of time. My alternative, then, would be to plot a scatter graph of max and min pollution, and look for those lines to run south east instead of north east.
Posted by: derek | Feb 12, 2007 at 04:03 AM
Right on, derek!
If scatterplots are too difficult, it would be a big improvement to see box-and-whisker diagrams, rather than just range bars. Any time I see only one statistic used, e.g. mean, median, or range, I immediately ask "What are they not telling us?" Certainly the European Environmental Agency has data for a boxplot.
Posted by: TheRandomTexan | Feb 12, 2007 at 05:56 AM
plotting two time series side by side
Sorry, I should have said "superimposed", not "side by side". The articles I had in mind are probably these two:
Dissecting two axes
The Crossover Law of Petropolitics
Posted by: derek | Feb 12, 2007 at 06:02 AM
At first I was going to do a scatter plot. But then I'd have to pick one of max, min, median, etc. I absolutely agree with the EDA comment. It's very unclear to me what their "ranges" mean, especially when some of the intervals are much wider than the others. Perhaps some of our readers are in this field and can offer help in interpreting the data.
Posted by: Kaiser | Feb 12, 2007 at 08:13 AM
I don't see much reverse correlation in the two pollutants. The ranges are too borad and not enough is known about the relative concentrations of the pollutants at different times. I've made my own chart and given a brief analysis on this page:
Posted by: Jon Peltier | Feb 12, 2007 at 08:45 AM
Actually, you can design an engine (car, plane, power plant, etc.) either towards low NOx or low soot emissions. So the negative correlation doesn't really surprise me :-)
Posted by: Armin | Oct 01, 2007 at 04:09 PM