Metaphors, maps, and communicating data
Reading this chart won't take as long as withdrawing troops from Afghanistan

Two commendable student projects, showing different standards of beauty

A few weeks ago, I did a guest lecture for Ray Vella's dataviz class at NYU, and discussed a particularly hairy dataset that he assigns to students.

I'm happy to see the work of the students, and there are two pieces in particular that show promise.

The following dot plot by Christina Barretto shows the disparities between the richest and poorest nations increasing between 2000 and 2015.

BARRETTO  Christina - RIch Gets Richer Homework - 2021-04-14

The underlying dataset has the average GDP per capita for the richest and the poor regions in each of nine countries, for two years (2000 and 2015). With each year, the data are indiced to the national average income (100). In the U.K., the gap increased from around 800 to 1,100 in the 15 years. It's evidence that the richer regions are getting richer, and the poorer regions are getting poorer.

(For those into interpreting data, you should notice that I didn't say the rich getting richer. During the lecture, I explain how to interpret regional averages.)

Christina's chart reflects the tidy, minimalist style advocated by Tufte. The countries are sorted by the 2000-to-2015 difference, with Britain showing up as an extreme outlier.


The next chart by Adrienne Umali is more infographic than Tufte.

Adrienne Umali_v2

It's great story-telling. The top graphic explains the underlying data. It shows the four numbers and how the gap between the richest and poorest regions is computed. Then, it summarizes these four numbers into a single metric, "gap increase". She chooses to measure the change as a ratio while Christina's chart uses the difference, encoded as a vertical line.

Adrienne's chart is successful because she filters our attention to a single country - the U.S. It's much too hard to drink data from nine countries in one gulp.

This then sets her up for the second graphic. Now, she presents the other eight countries. Because of the work she did in the first graphic, the reader understands what those red and green arrows mean, without having to know the underlying index values.

Two small suggestions: a) order the countries from greatest to smallest change; b) leave off the decimals. These are minor flaws in a brilliant piece of work.




Feed You can follow this conversation by subscribing to the comment feed for this post.

Richard Krablin

These two charts are indeed very well done. I agree with your suggestion of dropping the decimals, or, put another way, reducing the number of significant figures, which otherwise imply greater precision than perhaps is available in the data set.

One question not answered by the data presented and important for interpretation of the "gap" is the number of regions in each case. It would seem that the more regions the greater the gap between the extremes.


RK: That's the reason why this dataset is tough. Eqality between regions is not the same as equality between people. That's the lesson of my guest lecture. Of course, we get behind how to connect those two concepts.

If we apply a Trifecta Checkup analysis, the problem starts with specifying what the question is. Is it about equality of regions or equality of people? I suspect that the question is equality of people but the "found data" are regional GDPs so we have a mismatch. The data aren't addressing the posed question.


I think your suggestion to "b) leave off the decimals" is under-stated. Extraneous decimals are worse than other extra added bits on a chart, since they're both right in the reader's face and give a sense of precision which often does not exist. It's a very common error that I emphasize on in week 1 of training for any new analyst or data scientist.


Why is there no label on the y-axis? Is it GDP in some common currency? It's not clear.
Why does the y-axis have a non-zero origin? Given how close some of the values are to zero, it seems like it should be zero.


Todd: You raised an excellent issue. Both students left off the complicated scale of the data. What is plotted is regional average GDP expressed as an index to each country's average GDP in any given year. The first student started the axis at 100 probably for that reason. The second effort uses a column chart, which has to start from 0, and she did that. One can argue that for index values, a column chart is not best choice. However, if we think abstractly, it's acceptable.

The comments to this entry are closed.