Getting the Nobel wrong
Following one's nose 1

The hard work of entertaining

Stefan pointed us to his work for the UN GEO (United Nations Global Environment Outlook) data portal.  This set of information posters highlights a vexing issue that crops up on Junk Charts from time to time, that is, the proper balance between information and entertainment value of data displays.  While this blog concerns itself primarily with the former, it does not mean that we are blind to the flashier side of the enterprise.


Recycling Let's take Stefan's recycling spiral chart as an example.  One must admit that visually this presentation is more appealing than either a data table or a set of bar charts.  The reader can obtain the primary piece of information, which is the ranking of different countries in terms of the proportion of collected waste that is recycled. 

And if the reader is curious enough, the chart also provides the data on the per-capita amount of waste collected in each of these countries.  (Like the table and bar chart, this display also has the problem that it is one-dimensional, thus the countries can be sorted by proportion of recycling but then the waste collected data will be out of order.)

For those readers who would like to understand the data better, they would want to know some of the following:

  • Is there a relationship between amount of waste collected and amount of waste recycled?
  • Are there differences in culture resulting in different recycling rates?
  • Is the level of development of a country predictive of its recycling rate?
  • Why are some countries recycling more of its waste, and others less?

To address these types of questions, one can start with the following scatter plot.


With the exception of South Korea, there is a general pattern of positive correlation: the more waste collected per capita, the larger proportion of such waste recycled.  Any dots that are not in the bottom left or top right quadrant are exceptions to the rule.  These countries are labeled in red or blue, the former indicating that the amount of collection is above average while the rate of recycling is below average. 

Because there is sampling error, dots that are close to the average dot (the center of this scatter plot) are probably just average.  Roughly speaking, dots in the gray circle are close enough to the center that I would not consider them exceptional cases.  That leaves Spain and Iceland in the red corner, and South Korea in the blue corner.  If both data series are considered together, these three countries should merit attention; if only the proportion of recycling is considered, then one would pay attention to Italy, Turkey and Slovak Republic on the lower end and South Korea on the high end.

Scatter plots are very versatile.  The following one explores the issue of development level.  Surprisingly, the level of recycling seems to have little to do with development; the countries are quite widely scattered.


Technical note: The data on both axes are expressed in "standardized" units.  So the zeroes represent the average per-capita waste collected, and the average proportion of waste recycled (only of those countries depicted in the original chart).  +1 indicates an amount that is one standard deviation above the average.  Think of "standardized units" as measuring how extreme is a particular country with respect to the average. 



The first poster on seems to use the wrong units, rather than ktoes it should be toes. This matches the fourth graph which has American CO2 at about 20 tonnes/capita which seems reasonable for 8 tonnes of oil.


I'm assuming the comment on development is related to the graph showing countries in G8 and G20 in different colors. That a country belongs to G8 or G20 depends on the size of its economy, not its level of development. More fair would be to classify according to per capita income.

But level of development is difficult to quantify. Depending on your point of view, you could for example measure level of development as the proportion of waste recycled. :)

Kiko Laneras

What do you think of using the original units on the axis and then mark their average with two crossing lines?

The deviation of each data point will be more difficult to read, but you keep the information of the original variables and avoid the use of normalization which can be misleading for the casual reader.


Is there somewhere to download the raw data used here?

The comments to this entry are closed.