Digging it out
Horrid stuff 2

Horrid stuff

Ec_smoke Small multiples can work wonders when data are replicated, as in this case.  The chart accompanied an Economist article on pollution levels in several European cities, as indicated by the concentration of nitrogen dioxide and particulates.

In the junkart version, I plotted the data series side by side, rather than one over the other.  Further, the order of cities was according to decreasing levels of NO2, which seemed to be the worse pollutant.  All gridlines are removed except the 30 line which worked pretty well to separate out the highly polluted cities.

Redopollutant An odd pattern has now surfaced.  Namely, there is some degree of negative correlation between the concentration of the two pollutants.  Environmental scientists may be able to tell us why.

Reference: "The Big Smoke", Economist, Feb 3 2007.


Feed You can follow this conversation by subscribing to the comment feed for this post.


I think this is a case for exploratory data analysis. I'd like to know that a similar pattern was visible when the graphs were plotted in order of maximum particulate pollution as well as maximum NO2 pollution. And when the minimum and midpoints of each were plotted instead. If only one of the six combinations showed any visible pattern, and the sixth was what was published, I'd cry "Cherry picking!"

Also, scatter graphs would be the medium for displaying correlations, surely? haven't you railed in the past about plotting two time series side by side and claiming a correlation by virtue of the series looking similar when the scales are adjusted? This is just the same thing, but with a category (city) in place of time. My alternative, then, would be to plot a scatter graph of max and min pollution, and look for those lines to run south east instead of north east.


Right on, derek!

If scatterplots are too difficult, it would be a big improvement to see box-and-whisker diagrams, rather than just range bars. Any time I see only one statistic used, e.g. mean, median, or range, I immediately ask "What are they not telling us?" Certainly the European Environmental Agency has data for a boxplot.


plotting two time series side by side

Sorry, I should have said "superimposed", not "side by side". The articles I had in mind are probably these two:

Dissecting two axes

The Crossover Law of Petropolitics


At first I was going to do a scatter plot. But then I'd have to pick one of max, min, median, etc. I absolutely agree with the EDA comment. It's very unclear to me what their "ranges" mean, especially when some of the intervals are much wider than the others. Perhaps some of our readers are in this field and can offer help in interpreting the data.

Jon Peltier

I don't see much reverse correlation in the two pollutants. The ranges are too borad and not enough is known about the relative concentrations of the pollutants at different times. I've made my own chart and given a brief analysis on this page:



Actually, you can design an engine (car, plane, power plant, etc.) either towards low NOx or low soot emissions. So the negative correlation doesn't really surprise me :-)

The comments to this entry are closed.