Anti-encoding

Howie H., sometime contributor to our blog, found this chart in a doctor's office:

WhenToExpectAReturnCall_sm

Howie writes:

Among the multitude of data visualization sins here, I think the worst is that the chart *anti*-encodes the data; the longest wait time has the shortest arc!

While I waited I thought about a redesign.  Obviously a simple bar chart would work.  A properly encoded radial bar could work, or small multiple pie charts.  But I think the design brief here probably calls for a bit of responsible data art, as this is supposed to be an eye-catching poster.

I came up with a sort of bar superimposed on a calendar for reference.  To quickly draft the design it was easier to do small multiples, but maybe all three arrows could be placed on a two-week grid and the labels could be inside the arrows, or something like that.  It’s a very rough draft but I think it points toward a win-win of encoding the actual data while retaining the eye-catching poster-ness that I’m guessing was a design goal.

Here is his sketch:

JunkCharts-redo_howardh_WhenToExpectAReturnCall redesign sm

***

I found a couple of interesting ideas from Howie's re-design.

First, he tried to embody the concept of a week's wait by visual reference to a weekly calendar.

Second, in the third section, he wanted readers to experience "hardship"  by making them wrap their eyes to a second row.

He wanted the chart to be both accurate and eye-catching.

It's a nice attempt that will improve as he fiddles more with it.

***

Based on Howie's ideas, I came up with two sketches myself.

In the first sketch, instead of the arrows, I put numbers into the cells.

Junkcharts_redo_whentoexpectareturncall_1

In the second sketch, I emphasized eye-catching while sacrificing accuracy. It uses a spiral imagery, and I think it does a good job showing the extra pain of a week-long wait. Each trip around the circle represents 24 hours.

Junkcharts_redo_whentoexpectacall_2

The wait time is actually encoded in the traversal of angles, rather than the length of the spiral. I call this creation less than accurate because most readers will assume the spiral length to be the wait time, and thus misread the data.

Which one(s) do you like?


Coffee in different shapes and sizes: a test of self-sufficiency

Take a look at the following graphic showing top producers of coffee in 2o24:

Junkcharts_voronoicoffeeproduction

Then, try the following tasks:

  • Which country is the top producer?
  • What proportion of the world's production does the top country make?
  • Which countries form the top three?
  • How much is the "Rest of the World" compared to Brazil?
  • How many countries account for the top 50% of the world's production?
  • Does Indonesia or Columbia produce more coffee?
  • Compare India and Uganda
  • How about Honduras vs Peru?

I finished two cups of coffee and still couldn't answer most of these questions. How about you?

***

Now, let's look at the original chart, published by Voronoi, and sent to me by a long-time reader:

Visualcapitalist_coffee

Try those questions again, and the answers seem much more available.

How so?

What we've just demonstrated is that when the reader takes information from this graphic, the reader is consuming the data labels, while the visual encoding of data to shapes has offered zero help.

Given this finding, replacing the above chart with a data table would have achieved the same result, if not expediting understanding.

***

I'm using this graphic to illustrate my "self-sufficiency" test: by removing all data labels from the chart, we reveal how much work the visual elements are doing to enable understanding of the message and the underlying data.

***

Now, our long-time reader has a few comments, with which I agree:

  • what they did right: avoided the "let's just use a choropleth trap"
  • what went wrong? a) using shapes you can't compare at a glance
  • what went wrong? b) no color difference between the shapes
  • what went wrong? c) it looks like larger values are on top, except for Mexico which is squeezed up top for some reason

 

 

 

 

 

 


The reckless practice of eyeballing trend lines

MSN showed this chart claiming a huge increase in the number of British children who believe they are born the wrong gender.

Msn_genderdysphoria

The graph has a number of defects, starting with drawing a red line that clearly isn’t the trend in the data.

To find the trend line, we have to draw a line that is closest to the top of every column. The true trend line is closer to the blue line drawn below:

Junkcharts_redo_msngenderdysphoria_1

The red line moves up one unit roughly every three years while the blue line does so every four years.

Notice the dramatic jump in the last column of the chart. The observed trend is not a straight line, and therefore it is not appropriate to force a straight-line model. Instead, it makes more sense to divide the time line into three periods, with different rates of change.

Junkcharts_redo_msngenderdysphoria_2

Most of the growth during this 10 year period occurred in the last year, and one should check the data, and also check to see if any accounting criterion changed that might explain this large unexpected jump.

***

The other curiosity about this chart is the scale of the vertical axis. Nowhere on the chart does it say which metric of gender dysphoria it is depicting. The title suggests they are counting the number of diagnoses but the axis labels that range from one to five point to some other metric.

From the article, we learn that annual number of gender dysphoria diagnoses was about 10,000 in 2021, and that is encoded as 4.5 in the column chart. The sub-header of the chart indicates that the unit is number per 1,000 people. Ten thousand diagnoses divided by the population size of under 18 x 1,000 = 4.5. This implies there were roughly 2.2 million people under 18 in the U.K. in 2021.

But according to these official statistics (link), there were about 13 million people aged 0-18 in just England and Wales in mid-2022, which is not in the right range. From a dataviz perspective, the designer needs to explain what the values on the vertical axes represent. Right now, I have no idea what it means.

***

Using the Trifecta Checkup framework, we say that the question addressed by the chart is clear but there are problems relating to data encoding as well as the trend-line visual.

_trifectacheckup_image


Making major things easy, revisited

In the prior post, I made a chart that shows the driver license status of British drivers at different ages. The key change unplugs the obsession with a+b+c = 100%. Instead, the revised chart makes it easier to figure out what proportion of which age group holds which type of license.

This is the right-side plot from the panel of two plots:

Junkcharts_redo_significanceolddrivers_male

Looking at this chart, one might think my primary point of interest is the relative proportion with full license vs no license. But on second thought, I'm less interested in this comparison than that between male and female drivers. Does the prevalence of full licenses differ between men and women as they age?

In the original panel, the reader has to run back and forth between the two plots. Why not put that comparison on a single plot?

Like this:

Junkcharts_redo_significanceolderdrivers_fulllicense

This chart surfaces the difference between men and women (at all age groups) in owning full driver's licenses. Women are much more likely to stop driving earlier.

Here is the entire panel:

Junkcharts_redo_significanceolderdrivers_bylicense

Because of this structural choice, it is harder on this panel to learn the distribution of license status.