Expert handling of multiple dimensions of data

I enjoyed reading this Washington Post article about immigration in America. It features a number of graphics. Here's one graphic I particularly like:

Wpost_smallmultiplesmap

This is a small multiples of six maps, showing the spatial distribution of immigrants from different countries. The maps reveal some interesting patterns: Los Angeles is a big favorite of Guatamalans while Houston is preferred by Hondurans. Venezuelans like Salt Lake City and Denver (where there are also some Colombians and Mexicans). The breadth of the spatial distribution surprises me.

The dataset behind this graphic is complex. It's got country of origin, place of settlement, and time of arrival. The maps above collapsed the time dimension, while drawing attention to the other two dimensions.

***

They have another set of charts that highlight the time dimension while collapsing the place of settlement dimension. Here's one view of it:

Wpost_inkblot_overall

There are various names for this chart form. Stream river is one. I like to call it "inkblot", where the two sides are symmetric around the middle vertical line. The chart shows that "migrants in the U.S. immigration court" system have grown substantially since the end of the Covid-19 pandemic, during which they stopped coming.

I'm not a fan of the inkblot. One reason is visible in the following view, which showcases three Central American countries.

Wpost_inkblot_centralamerica

The main message is clear enough. The volume of immigrants from these three countries have been relatively stable over the last decade, with a bulge in the late 2000s. The recent spurt in migrants have come from other places.

But try figuring out what proportion of total immigration is accounted for by these three countries say in 2024. It's a task that is tougher than it should be, and the culprit is that the "other countries" category has been split in half with the two halves separated.

 


Visualizing fertility rates around the globe

The following chart dropped on my Twitter feed.

Twitter_fertility_chart

It's an ambitious chart that tries to do a lot. The underlying data set contains fertility rate data from over 200 countries over 20 years.

The basic chart form is a column chart that is curled up into a ball. The column chart is given colors that map to continents. All countries are grouped into five continents. The column chart can only take a single data series, so the 2019 fertility rate is chosen.

Beyond this basic setup, the designer embellishes the chart with a trove of information. Here's a close up:

Twitter_fertilityrate_excerpt

The first number is the 2019 fertility rate, which means all the data encoded into the columns are also printed on the chart itself. Then, the flag of each country forms the next ring. Then, the name of the country. Finally, in brackets, the percent change in fertility rate between 2000 and 2019.

That is not all. Some contextual information are injected in those arrows that connect the columns to the data labels. A green arrow indicates that the fertility rate is trending lower - which is the case in most countries around the world. Once in a while, a purple arrow pops up. In the above excerpt, Seychelles gets a purple arrow because this island nation has increased the fertility rate from 2000 to 2019.

Also hiding in the background are several dashed rings. I think only the one that partially overlaps with the column chart contains any information - the other rings are inserted for an artistic reason. To decipher this dashed ring, we must look at the inset in the top left corner. We learn that the value of 2.1 children per woman is known as the replacement fertility rate. So it's also possible to assess whether each country is above or below the replacement fertility rate threshold.

Twitter_fertility_world_trend

[I'm presuming that this replacement threshold is about the births necessary to avoid a population decline. If that's the case, then comparing each country's fertility rate to a global fertility rate threshold is too simplistic because fertility is only one of several key factors driving a country's population growth. A more sophisticated model should generate country-level thresholds.]

***

Data graphics serve many functions. This chart works well as an embellished data table. It does take some time to find a specific country because the columns have been sorted by decreasing 2019 fertility rate but once we locate the column, all the other data fields are clearly laid out.

As a generator of data insights, this chart is less effective. The main insight I obtained from it is a rough ranking of continents, with African countries predominantly having higher fertility rates, followed by Asia and Oceania, then Americas, and finally, Europe which has the lowest fertility rates. If this is the key message, a standard choropleth map brings it out more directly.

***

Here is a small-multiples rendering of the fertility dataset. I chose 1999 values instead of 2000 to make a complete two-decade view.

Junkcharts_redofertilitychart_1

The columns represent a grouping of countries based on their 1999 fertility rates. The left column contains countries with the lowest number of births per woman, and the fertility rate increases left to right - both within an individual plot and in the grid.

If you're wondering, the hidden vertical axis sorts the countries by their 1999 rank. The lighter colors are 1999 values while the darker colors are 2019 values. For most countries the dots are shifting left over the 20 years. There are some exceptions. I have labeled several of these exceptions (e.g. Kazakhstan and Mongolia), and rendered them in italic.

 

 

 


Pretty circular things

National Geographic features this graphic illustrating migration into the U.S. from the 1850s to the present.

Natgeo_migrationtreerings

 

What to Like

It's definitely eye-catching, and some readers will be enticed to spend time figuring out how to read this chart.

The inset reveals that the chart is made up of little colored strips that mix together. This produces a pleasing effect of gradual color gradation.

The white rings that separate decades are crucial. Without those rings, the chart becomes one long run-on sentence.

Once the reader invests time in learning how to read the chart, the reader will grasp the big picture. One learns, for example, that migrants from the most recent decades have come primarily from Latin America (orange) or Asia (pink). Migrants from Europe (green) and Canada (blue) came in waves but have been muted in the last few decades.

 

What's baffling

Initially, the chart is disorienting. It's not obvious whether the compass directions mean anything. We can immediately understand that the further out we go, the larger numbers of migrants. But what about which direction?

The key appears in the legend - which should be moved from bottom right to top left as it's so important. Apparently, continent/country of origin is coded in the directions.

This region-to-color coding seems to be rough-edged by design. The color mixing discussed above provides a nice artistic effect. Here, the reader finds out that mixing is primarily between two neighboring colors, thus two regions placed side by side on the chart. Thus, because Europe (green) and Asia (pink) are on opposite sides of the rings, those two colors do not mix.

Another notable feature of the chart is the lack of any data other than the decade labels. We won't learn how many migrants arrived in any decade, or the extent of migration as it impacts population size.

A couple of other comments on the circular design.

The circles expand in size for sure as time moves from inside out. Thus, this design only works well for "monotonic" data, that is to say, migration always increases as time passes.

The appearance of the chart is only mildly affected by the underlying data. Swapping the regions of origin changes the appearance of this design drastically.

 

 

 

 

 


Appreciating population mountains

Tim Harford tweeted about a nice project visualizing of the world's distribution of population, and wondered why he likes it so much. 

That's the question we'd love to answer on this blog! Charts make us emotional - some we love, some we hate. We like to think that designers can control those emotions, via design choices.

I also happen to like the "Population Mountains" project as well. It fits nicely into a geography class.

1. Chart Form

The key feature is to adopt a 3D column chart form, instead of the more conventional choropleth or dot density. The use of columns is particularly effective here because it is natural - cities do tend to expand vertically upwards when ever more people cramp into the same amount of surface area. 

Jc_popmount

Imagine the same chart form is used to plot the number of swimming pools per square meter. It just doesn't make the same impact. 

2. Color Scale

The designer also made judicious choices on the color scale. The discrete, 5-color scheme is a clear winner over the more conventional, continuous color scale. The designer made a deliberate choice because most software by default uses a continuous color scale for continuous data (population density per square meter).

Jc_popmount_colorscales

Also, notice that the color intervals in 5-color scale is not set uniformly because there is a power law in effect - the dense areas are orders of magnitude denser than the sparsely populated areas, and most locations are low-density. 

These decisions have a strong influence on the perception of the information: it affects the heights of the peaks, the contrasts between the highs and lows, etc. It also injects a degree of subjectivity into the data visualization exercise that some find offensive.

3. Background

The background map is stripped of unnecessary details so that the attention is focused on these "population mountains". No unnecessary labels, roads, relief, etc. This demonstrates an acute awareness of foreground/background issues.

4. Insights on the "shape" of the data 

The article makes the following comment:

What stands out is each city’s form, a unique mountain that might be like the steep peaks of lower Manhattan or the sprawling hills of suburban Atlanta. When I first saw a city in 3D, I had a feel for its population size that I had never experienced before.

I'd strike out population size and replace with population density. In theory, the sum of the areas of the columns in any given surface area gives you the "population size" but given the fluctuating heights of these columns, and the different surface areas (sprawls) of different cities, it is an Olympian task to estimate the volumes of the population mountains!

The more salient features of these mountains, most easily felt by readers, are the heights of the peak columns, the sprawl of the cities, and the general form of the mass of columns. The volume of the mountain is one of the tougher things to see. Similarly, the taller 3D columns hide what's behind them, and you'd need to spin and rotate the map to really get a good feel.

Here is the contrast between Paris and London, with comparable population sizes. You can see that the population in Paris (and by extension, France) is much more concentrated than in the U.K. This difference is a surprise to me.

Jc_popmount_parislondon

5. Sourcing

Some of the other mountains, especially those in India and China, look a bit odd to me, which leads me to wonder about the source of the data. This project has a very great set of footnotes that not only point to the source of the data but also a discussion of its limitations, including the possibility of inaccuracies in places like India and China. 

***

Check out Population Mountains!

 

 

 

 

 


The merry-go-round of investment bankers

Here is the start of my blog post about the chart I teased the other day:

Businessinsider_ibankers

 

Today's post deals with the following chart, which appeared recently at Business Insider (hat tip: my sister).

It's immediately obvious that this chart requires a heroic effort to decipher. The question shown in the chart title "How many senior investment bankers left their firms?" is the easiest to answer, as the designer places the number of exits in the central circle of each plot relating to a top-tier investment bank (aka "featured bank"). Note that the visual design plays no role in delivering the message, as readers just scan the data from those circles.

Anyone persistent enough to explore the rest of the chart will eventually discover these features...

***

The entire post including an alternative view of the dataset is a guest blog at the JMP Blog here. This is a situation in which plotting everything will make an unreadable chart, and the designer has to think hard about what s/he is really trying to accomplish.