Dataviz worth your time
People are happier in some parts of the country as Labor Day nears

Mind your Qs

Rescheduling Notice: I have been informed by the organizers that the Meetup tonight has to be rescheduled due to an unexpected problem with the venue. When a new date is set, I will let you know.

***

Since I am not working on the slides for the Meetup, I have a little time to follow up on the post about the World Bank graphic.

One common response, also expressed on Twitter, is to "fix" it by using a scatter plot. Xan helpfully drew one up, which I added to the post.

I mentioned, cryptically, that if you try making improvements, you will find that the chart is a Type QD, not a Type D. There are clearly problems with the data but this chart cannot be "fixed" until one clarifies what the message of the chart really is.

The original chart plots (y=) GDP per capita against (x=) cumulative proportion of the world's population with countries ordered from lowest to highest GDP per capita. Embedded in the rectangular areas is total GDP.

Imfchart_gdp_vox

Xan's chart plots (y=) total GDP in PPP terms against (x=) population. The per-capita PPP GDP is readable through diagonal gridlines.

Xan_redo_worldbank-400wi

Xan's chart is undoubtedly less confusing, and more direct. But it won't answer the cumulative question that the World Bank seems to be asking. That question is: how much of the world's wealth (measured in GDP) is held by the poorest X% of the population. This isn't something you can find on the scatter plot.

Now, the "cumulative" question is nice to think about but it is ill-posed for the kinds of data available. Each country ends up being represented by its average (per capita) wealth, but there is rampant wealth inequality within countries. Even though Nigeria is in the bottom 15%, it is certainly not true that the entire population of Nigeria belongs to the world's poorest 15%.

When a reader tweeted that a scatter plot is the solution, I asked: "Which two variables?" Here are just a few candidates:

total GDP
GDP per capita
total GDP PPP
PPP GDP per capita
cumulative total GDP, ordered by per-capita GDP
cumulative total GDP, ordered by total GDP
cumulative total GDP, ordered by total population
cumulative total GDP, ordered by population growth
cumulative total GDP PPP, ordered by per-capita GDP PPP
cumulative total GDP PPP, ordered by total GDP PPP
cumulative total GDP PPP, ordered by total population
cumulative total GDP PPP, ordered by population growth
cumulative total population
cumulative GDP per capita
cumulative GDP PPP per capita
population
working population
total GDP growth
total GDP PPP growth
total GDP per capita growth
total GDP PPP per capita growth
total population growth
total working population growth
median GDP
median GDP PPP

Different charts address different questions, some of which are more meaningful and some of which have better data. There may be a few interesting questions, in which case a set of scatter plots may work better.

 

 

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Jörgen Abrahamsson

Xans chart uses a log-log plot
http://en.wikipedia.org/wiki/Log-log_plot
This is something very, very different from the original data.

It is a bit like talking about speed or acceleration. Not the same thing.

1nsidej0ke

What application did Xan use to create the scatter plot?

Xan Gregg

Just noticed the question about how I made the scatter plot. I used my company's "Statistical Discovery" software, JMP. Most features shown here were automatic, though the pink lines and labels were added programmatically as annotations.

The comments to this entry are closed.