An anonymous reader sent in a Type V critique of the following map of July unemployment rates by state. The map was published by the Bureau of Labor Statistics (BLS), and used in a recent article in Vox.
Matt @ Vox took the BLS's bait, and singled out Mississippi as the worst in the nation. Our reader-contributor is none too pleased with this conclusion.
He noted that the red state stands out only because of the high "out of sample" top range of the legend. Three out of the seven colors are not found on the map at all! This is kind of like the white space problem when doing a line plot with large values and an axis starting at zero (for example, here), but the opposite. All the states are compressed into four colors, three of which are shades of orange.
The reader investigated, and reported back:
The top end of the legend seems to be set by Puerto Rico's 13.1%. Puerto Rico is omitted from the Vox map as well as from the BLS publication (link to PDF).
Mississippi only has the bare minimum, 8.0%, to qualify for the red color. Georgia is a 7.8; Michigan, Nevada, and Rhode Island are all 7.7.
24 (of the 50 States plus DC) are in the 6-8% band, and 21 are in the 4-6% band, with the remaining 5 under 4%.
None of the above is obvious when looking at the map.
In the Trifecta Checkup, this is a Type V chart. The data is accurate. The question being asked is clear but the visual construction is problematic.
[I'm seizing back the mike.] While the map is often not the best choice for showing geographic data, something we frequently cover on this blog, in this particular case, there is a strong regional pattern. Of course, with the compressed choice of colors, this regional pattern is not easily observed in the original.
Rescheduling Notice: I have been informed by the organizers that the Meetup tonight has to be rescheduled due to an unexpected problem with the venue. When a new date is set, I will let you know.
Since I am not working on the slides for the Meetup, I have a little time to follow up on the post about the World Bank graphic.
One common response, also expressed on Twitter, is to "fix" it by using a scatter plot. Xan helpfully drew one up, which I added to the post.
I mentioned, cryptically, that if you try making improvements, you will find that the chart is a Type QD, not a Type D. There are clearly problems with the data but this chart cannot be "fixed" until one clarifies what the message of the chart really is.
The original chart plots (y=) GDP per capita against (x=) cumulative proportion of the world's population with countries ordered from lowest to highest GDP per capita. Embedded in the rectangular areas is total GDP.
Xan's chart plots (y=) total GDP in PPP terms against (x=) population. The per-capita PPP GDP is readable through diagonal gridlines.
Xan's chart is undoubtedly less confusing, and more direct. But it won't answer the cumulative question that the World Bank seems to be asking. That question is: how much of the world's wealth (measured in GDP) is held by the poorest X% of the population. This isn't something you can find on the scatter plot.
Now, the "cumulative" question is nice to think about but it is ill-posed for the kinds of data available. Each country ends up being represented by its average (per capita) wealth, but there is rampant wealth inequality within countries. Even though Nigeria is in the bottom 15%, it is certainly not true that the entire population of Nigeria belongs to the world's poorest 15%.
When a reader tweeted that a scatter plot is the solution, I asked: "Which two variables?" Here are just a few candidates:
total GDP GDP per capita total GDP PPP PPP GDP per capita cumulative total GDP, ordered by per-capita GDP cumulative total GDP, ordered by total GDP cumulative total GDP, ordered by total population cumulative total GDP, ordered by population growth cumulative total GDP PPP, ordered by per-capita GDP PPP cumulative total GDP PPP, ordered by total GDP PPP cumulative total GDP PPP, ordered by total population cumulative total GDP PPP, ordered by population growth cumulative total population cumulative GDP per capita cumulative GDP PPP per capita population working population total GDP growth total GDP PPP growth total GDP per capita growth total GDP PPP per capita growth total population growth total working population growth median GDP median GDP PPP
Different charts address different questions, some of which are more meaningful and some of which have better data. There may be a few interesting questions, in which case a set of scatter plots may work better.
The New York Times Upshot team came up with a dataviz that is worth your time. This is a set of maps that gives a perspective on migration patterns within the US. The metric being portrayed is the birthplace of current residents of each state.
Here is the chart for California:
I see a few smart ideas, starting with the little map on the bottom left. It servies multiple functions. It is a legend mapping colors to four regions of the US. It serves as a visual guide to the definition of regions. It serves as an interactive tool to select states. Readers might remember the use of a pie chart as a legend in my remake of one of the Wikipedia pie charts (link).
The aggregation up to regions is what really makes this chart work. This aggregation reduces the number of pieces from about 50 to about 10.
They also did a great job with the axes and gridlines. Much of the data labels are hidden but the most important numbers are retained. These include the proportion of residents who were born in their home state, the proportion of residents who were born outside the U.S., and any state(s) that contribute a significant portion of residents. In the California example, we see that the proportion of Midwest-born people living in California has declined by a lot over time.
Users can interactively hover over the gridlines to uncover the data labels.
As you scroll through the states, there are some recurring patterns.
Some states clearly have become more desirable over time. Georgia, for instance, has seen strong in-migration (colored pieces) especially from non-Southern states:
This pattern is repeated in other southeastern states, including Virginia, North Carolina and Tennessee.
By contrast, some states are not getting the migrants. As a result, the share of residents born in the home state has increased over time. The Midwestern states have this problem. For instance, Minnesota:
I also find a few states with special features. Nevada has always been a state of migrants:
Wyoming on the other hand has become popular with migrants over time but the composition has shifted away from MidWest states.
I'd have preferred presenting the charts in clusters based on patterns.
I haven't been able to figure out the multi-color spaghetti. I think the undulations are purely for aesthetic reasons.
One way to read the chart, then, is to first see three big patches (light grey for born in current state; white patch for born in other U.S. states; dark gray for born outside the U.S.). Within the white patch, we are looking for the shift between the colors (i.e. regions).