*Rescheduling Notice:* I have been informed by the organizers that the Meetup tonight has to be rescheduled due to an unexpected problem with the venue. When a new date is set, I will let you know.

***

Since I am not working on the slides for the Meetup, I have a little time to follow up on the post about the World Bank graphic.

One common response, also expressed on Twitter, is to "fix" it by using a scatter plot. Xan helpfully drew one up, which I added to the post.

I mentioned, cryptically, that if you try making improvements, you will find that the chart is a Type QD, not a Type D. There are clearly problems with the data but this chart cannot be "fixed" until one clarifies what the message of the chart really is.

The original chart plots (y=) GDP per capita against (x=) cumulative proportion of the world's population with countries ordered from lowest to highest GDP per capita. Embedded in the rectangular areas is total GDP.

Xan's chart plots (y=) total GDP in PPP terms against (x=) population. The per-capita PPP GDP is readable through diagonal gridlines.

Xan's chart is undoubtedly less confusing, and more direct. But it won't answer the cumulative question that the World Bank seems to be asking. That question is: how much of the world's wealth (measured in GDP) is held by the poorest X% of the population. This isn't something you can find on the scatter plot.

Now, the "cumulative" question is nice to think about but it is ill-posed for the kinds of data available. Each country ends up being represented by its average (per capita) wealth, but there is rampant wealth inequality within countries. Even though Nigeria is in the bottom 15%, it is certainly not true that the entire population of Nigeria belongs to the world's poorest 15%.

When a reader tweeted that a scatter plot is the solution, I asked: "Which two variables?" Here are just a few candidates:

total GDP

GDP per capita

total GDP PPP

PPP GDP per capita

cumulative total GDP, ordered by per-capita GDP

cumulative total GDP, ordered by total GDP

cumulative total GDP, ordered by total population

cumulative total GDP, ordered by population growth

cumulative total GDP PPP, ordered by per-capita GDP PPP

cumulative total GDP PPP, ordered by total GDP PPP

cumulative total GDP PPP, ordered by total population

cumulative total GDP PPP, ordered by population growth

cumulative total population

cumulative GDP per capita

cumulative GDP PPP per capita

population

working population

total GDP growth

total GDP PPP growth

total GDP per capita growth

total GDP PPP per capita growth

total population growth

total working population growth

median GDP

median GDP PPP

Different charts address different questions, some of which are more meaningful and some of which have better data. There may be a few interesting questions, in which case a set of scatter plots may work better.

Xans chart uses a log-log plot

http://en.wikipedia.org/wiki/Log-log_plot

This is something very, very different from the original data.

It is a bit like talking about speed or acceleration. Not the same thing.

Posted by: Jörgen Abrahamsson | Aug 22, 2014 at 11:31 AM

What application did Xan use to create the scatter plot?

Posted by: 1nsidej0ke | Aug 28, 2014 at 01:49 PM

Just noticed the question about how I made the scatter plot. I used my company's "Statistical Discovery" software, JMP. Most features shown here were automatic, though the pink lines and labels were added programmatically as annotations.

Posted by: Xan Gregg | Jun 30, 2015 at 08:49 AM