« March 2021 | Main | May 2021 »

Two commendable student projects, showing different standards of beauty

A few weeks ago, I did a guest lecture for Ray Vella's dataviz class at NYU, and discussed a particularly hairy dataset that he assigns to students.

I'm happy to see the work of the students, and there are two pieces in particular that show promise.

The following dot plot by Christina Barretto shows the disparities between the richest and poorest nations increasing between 2000 and 2015.

BARRETTO  Christina - RIch Gets Richer Homework - 2021-04-14

The underlying dataset has the average GDP per capita for the richest and the poor regions in each of nine countries, for two years (2000 and 2015). With each year, the data are indiced to the national average income (100). In the U.K., the gap increased from around 800 to 1,100 in the 15 years. It's evidence that the richer regions are getting richer, and the poorer regions are getting poorer.

(For those into interpreting data, you should notice that I didn't say the rich getting richer. During the lecture, I explain how to interpret regional averages.)

Christina's chart reflects the tidy, minimalist style advocated by Tufte. The countries are sorted by the 2000-to-2015 difference, with Britain showing up as an extreme outlier.

***

The next chart by Adrienne Umali is more infographic than Tufte.

Adrienne Umali_v2

It's great story-telling. The top graphic explains the underlying data. It shows the four numbers and how the gap between the richest and poorest regions is computed. Then, it summarizes these four numbers into a single metric, "gap increase". She chooses to measure the change as a ratio while Christina's chart uses the difference, encoded as a vertical line.

Adrienne's chart is successful because she filters our attention to a single country - the U.S. It's much too hard to drink data from nine countries in one gulp.

This then sets her up for the second graphic. Now, she presents the other eight countries. Because of the work she did in the first graphic, the reader understands what those red and green arrows mean, without having to know the underlying index values.

Two small suggestions: a) order the countries from greatest to smallest change; b) leave off the decimals. These are minor flaws in a brilliant piece of work.

 

 


Metaphors, maps, and communicating data

There are some data visualization that are obviously bad. But what makes them bad?

Here is an example of such an effort:

Carbon footprint 2021-02-15_0

This visualization of carbon emissions is not successful. There is precious little that a reader can learn from this chart without expensing a lot of effort. It's relatively easy to identify the largest emitters of carbon but since the data are not expressed per-capita, the chart mainly informs us which countries have the largest populations. 

The color of the bubbles informs readers which countries belong to which parts of the world. However, it distorts the location of countries within regions, and regions relative to regions, as the primary constraint is fitting the bubbles inside the shape of a foot.

The visualization gives a very rough estimate of the relative sizes of total emissions. The circles not being perfect circles don't help. 

It's relatively easy to list the top emitters in each region but it's hard to list the top 10 emitters in the world (try!) 

The small emitters stole all of the attention as they account for most of the labels - and they engender a huge web of guiding lines - an unsightly nuisance.

The diagram clings dearly to the "carbon footprint" metaphor. Does this metaphor help readers consume the emissions data? Conversely, does it slow them down?

A more conventional design uses a cartogram, a type of map in which the positioning of countries are roughly preserved while the geographical areas are coded to the data. Here's how it looks:

Carbonatlasthumb

I can't seem to source this effort. If any reader can find the original source, please comment below.

This cartogram is a rearrangement of the footprint illustration. The map construct eliminates the need to include a color legend which just tells people which country is in which continent. The details of smaller countries are pushed to the bottom. 

In the footprint visualization, I'd even consider getting rid of the legend completely. This means trusting that readers know South Africa is part of Africa, and China is part of Asia.

Carbonfootprint_part

Imagine: what if this chart comes without a color legend? Do we really need it?

***

I'd like to try a word cloud visual for this dataset. Something that looks like this (obviously with the right data encoding):

Michaeltompsett_worldmapwords

(This map is by Michael Tompsett who sells it here.)

 


Come si dice donut in italiano

One of my Italian readers sent me the following "horror chart". (Last I checked, it's not Halloween.)

Horrorchart

I mean, people are selling these rainbow sunglasses.

Rainbowwunglasses

The dataset behind the chart is the market share of steel production by country in 1992 and in 2014. The presumed story is how steel production has shifted from country to country over those 22 years.

Before anything else, readers must decipher the colors. This takes their eyes off the data and on to the color legend placed on the right column. The order of the color legend is different from that found in the nearest object, the 2014 donut. The following shows how our eyes roll while making sense of the donut chart.

Junkcharts_steeldonuts_eye1

It's easier to read the 1992 donut because of the order but now, our eyes must leapfrog the 2014 donut.

Junkcharts_steeldonuts_eye2

This is another example of a visualization that fails the self-sufficiency test. The entire dataset is actually printed around the two circles. If we delete the data labels, it becomes clear that readers are consuming the data labels, not the visual elements of the chart.

Junkcharts_steeldonuts_sufficiency

The chart is aimed at an Italian audience so they may have a patriotic interest in the data for Italia. What they find is disappointing. Italy apparently completely dropped out of steel production. It produced 3% of the world's steel in 1992 but zero in 2014.

Now I don't know if that is true because while reproducing the chart, I noticed that in the 2014 donut, there is a dark orange color that is not found in the legend. Is that Italy or a mysterious new entrant to steel production?

One alternative is a dot plot. This design accommodates arrows between the dots indicating growth versus decline.

Junkcharts_redo_steeldonuts

 


Losses trickle down while gains trickle up

In a rich dataset, it's hard to convey all the interesting insights on a single chart. Following up on the previous post, I looked further at the wealth distribution dataset. In the previous post, I showed this chart, which indicated that the relative wealth of the super-rich (top 1%) rose dramatically around 2011.

Redo_bihouseholdwealth_legend

As a couple of commenters noticed, that's relative wealth. I indiced everything to the Bottom 50%.

In this next chart, I apply a different index. Each income segment is set to 100 at the start of the time period under study (2000), and I track how each segment evolved in the last two decades.

Junkcharts_redo_bihouseholdwealth_2

This chart offers many insights.

The Bottom 50% have been left far, far behind in the last 20 years. In fact, from 2000-2018, this segment's wealth never once reached the 2000 level. At its worst, around 2010, the Bottom 50% found themselves 80% poorer than they were 10 years ago!

In the meantime, the other half of the population has seen their wealth climb continuously through the 20 years. This is particularly odd because the major crisis of these two decades was the Too Big to Fail implosion of financial instruments, which the Bottom 50% almost surely did not play a part in. During that crisis, the top 50% were 30-60% better off than they were in 2000. Is this the "trickle-down" economy in which losses are passed down (but gains are passed up)?

The chart also shows how the recession hit the bottom 50% much deeper, and how the recovery took more than a decade. For the top half, the recovery came between 2-4 years.

It also appears that top 10% are further peeling off from the rest of the population. Since 2009, the top 11-49% have been steadily losing ground relative to the top 10%, while the gap between them and the Bottom 50% has narrowed.

***

This second chart is not nearly as dramatic as the first one but it reveals much more about the data.

 


Finding the hidden information behind nice-looking charts

This chart from Business Insider caught my attention recently. (link)

Bi_householdwealthchart

There are various things they did which I like. The use of color to draw a distinction between the top 3 lines and the line at the bottom - which tells the story that the bottom 50% has been left far behind. Lines being labelled directly is another nice touch. I usually like legends that sit atop the chart; in this case, I'd have just written the income groups into the line labels.

Take a closer look at the legend text, and you'd notice they struggled with describing the income percentiles.

Bi_householdwealth_legend

This is a common problem with this type of data. The top and bottom categories are easy, as it's most natural to say "top x%" and "bottom y%". By doing so, we establish two scales, one running from the top, and the other counting from the bottom - and it's a head scratcher which scale to use for the middle categories.

The designer decided to lose the "top" and "bottom" descriptors, and went with "50-90%" and "90-99%". Effectively, these follow the "bottom" scale. "50-90%" is the bottom 50 to 90 percent, which corresponds to the top 10 to 50 percent. "90-99%" is the bottom 90-99%, which corresponds to the top 1 to 10%. On this chart, since we're lumping the top three income groups, I'd go with "top 1-10%" and "top 10-50%".

***

The Business Insider chart is easy to mis-read. It appears that the second group from the top is the most well-off, and the wealth of the top group is almost 20 times that of the bottom group. Both of those statements are false. What's confusing us is that each line represents very different numbers of people. The yellow line is 50% of the population while the "top 1%" line is 1% of the population. To see what's really going on, I look at a chart showing per-capita wealth. (Just divide the data of the yellow line by 50, etc.)

Redo_bihouseholdwealth_legend

For this chart, I switched to a relative scale, using the per-capita wealth of the Bottom 50% as the reference level (100). Also, I applied a 4-period moving average to smooth the line. The data actually show that the top 1% holds much more wealth per capita than all other income segments. Around 2011, the gap between the top 1% and the rest was at its widest - the average person in the top 1% is about 3,000 times wealthier than someone in the bottom 50%.

This chart raises another question. What caused the sharp rise in the late 2000s and the subsequent decline? By 2020, the gap between the top and bottom groups is still double the size of the gap from 20 years ago. We'd need additional analyses and charts to answer this question.

***

If you are familiar with our Trifecta Checkup, the Business Insider chart is a Type D chart. The problem with it is in how the data was analyzed.