Speedometer charts: love or hate

Pie chart hate is tired. In this post, I explain my speedometer hate. (Also called gauges,  dials)

Next to pie charts, speedometers are perhaps the second most beloved chart species found on business dashboards. Here is a typical example:

Speedometers_example

 

For this post, I found one on Reuters about natural gas in Europe. (Thanks to long-time contributor Antonio R. for the tip.)

Eugas_speedometer

The reason for my dislike is the inefficiency of this chart form. In classic Tufte-speak, the speedometer chart has a very poor data-to-ink ratio. The entire chart above contains just one datum (73%). Most of the ink are spilled over non-data things.

This single number has a large entourage:

- the curved axis
- ticks on the axis
- labels on the scale
- the dial
- the color segments
- the reference level "EU target"

These are not mere decorations. Taking these elements away makes it harder to understand what's on the chart.

Here is the chart without the curved axis:

Redo_eugas_noaxis

Here is the chart without axis labels:

Redo_eugas_noaxislabels

Here is the chart without ticks:

Redo_eugas_notickmarks

When the tick labels are present, the chart still functions.

Here is the chart without the dial:

Redo_eugas_nodial

The datum is redundantly encoded in the color segments of the "axis".

Here is the chart without the dial or the color segments:

Redo_eugas_nodialnosegments

If you find yourself stealing a peek at the chart title below, you're not alone.

All versions except one increases our cognitive load. This means the entourage is largely necessary if one encodes the single number in a speedometer chart.

The problem with the entourage is that readers may resort to reading the text rather than the chart.

***

The following is a minimalist version of the Reuters chart:

Redo_eugas_onedial

I removed the axis labels and the color segments. The number 73% is shown using the dial angle.

The next chart adds back the secondary message about the EU target, as an axis label, and uses color segments to show the 73% number.

Redo_eugas_nodialjustsegments

Like pie charts, there are limited situations in which speedometer charts are acceptable. But most of the ones we see out there are just not right.

***

One acceptable situation is to illustrate percentages or proportions, which is what the EU gas chart does. Of course, in that situation, one can alo use a pie chart without shame.

For illustrating proportions, I prefer to use a full semicircle, instead of the circular sector of arbitrary angle as Reuters did. The semicircle lends itself to easy marks of 25%, 50%, 75%, etc, eliminating the need to print those tick labels.

***

One use case to avoid is numeric data.

Take the regional sales chart pulled randomly from a Web search above:

Speedometers_example

These charts are completely useless without the axis labels.

Besides, because the span of the axis isn't 0% to 100%, every tick mark must be labelled with the numeric value. That's a lot of extra ink used to display a single value!


Who trades with Sweden

It's great that the UN is publishing dataviz but it can do better than this effort:

Untradestats_sweden

Certain problems are obvious. The country names turned sideways. The meaningless use of color. The inexplicable sequencing of the country/region.

Some problems are subtler. "Area, nes" - upon research - is a custom term used by UN Trade Statistics, meaning "not elsewhere specified".

The gridlines are debatable. Their function is to help readers figure out the data values if they care. The design omitted the top and bottom gridlines, which makes it hard to judge the values for USA (dark blue), Netherlands (orange), and Germany (gray).

See here, where I added the top gridline.

Redo_untradestats_sweden_gridline

Now, we can see this value is around 3.6, just over the halfway point between gridlines.

***

A central feature of trading statistics is "balance". The following chart makes it clear that the positive numbers outweigh the negative numbers in the above chart.

Redo_untradestats_sweden

At the time I made the chart, I wasn't sure how to interpret the gap of 1.3%. Looking at the chart again, I think it's saying Sweden has a trade surplus equal to that amount.


Superb tile map offering multiple avenues for exploration

Here's a beauty by WSJ Graphics:

Wsj_powerproduction

The article is here.

This data graphic illustrates the power of the visual medium. The underlying dataset is complex: power production by type of source by state by month by year. That's more than 90,000 numbers. They all reside on this graphic.

Readers amazingly make sense of all these numbers without much effort.

It starts with the summary chart on top.

Wsj_powerproduction_us_summary

The designer made decisions. The data are presented in relative terms, as proportion of total power production. Only the first and last years are labeled, thus drawing our attention to the long-term trend. The order of the color blocks is carefully selected so that the cleaner sources are listed at the top and the dirtier sources at the bottom. The order of the legend labels mirrors the color blocks in the area chart.

It takes only a few seconds to learn that U.S. power production has largely shifted away from coal with most of it substituted by natural gas. Other than wind, the green sources of power have not gained much ground during these years - in a relative sense.

This summary chart serves as a reading guide for the rest of the chart, which is a tile map of all fifty states. Embedded in the tile map is a small-multiples arrangement.

***

The map offers multiple avenues for exploration.

Some readers may look at specific states. For example, California.

Wsj_powerproduction_california

Currently, about half of the power production in California come from natural gas. Notably, there is no coal at all in any of these years. In addition to wind, solar energy has also gained. All of these insights come without the need for any labels or gridlines!

Wsj_powerproduction_westernstatesBrowsing around California, readers find different patterns in other Western states like Oregon and Washington.

Hydroelectric energy is the dominant source in those two states, with wind gradually taking share.

At this point, readers realize that the summary chart up top hides remarkable state-level variations.

***

There are other paths through the map.

Some readers may scan the whole map, seeking patterns that pop out.

One such pattern is the cluster of states that use coal. In most of these states, the proportion of coal has declined.

Yet another path exists for those interested in specific sources of power.

For example, the trend in nuclear power usage is easily followed by tracking the purple. South Carolina, Illinois and New Hampshire are three states that rely on nuclear for more than half of its power.

Wsj_powerproduction_vermontI wonder what happened in Vermont about 8 years ago.

The chart says they renounced nuclear energy. Here is some history. This one-time event caused a disruption in the time series, unique on the entire map.

***

This work is wonderful. Enjoy it!


Start at zero, or start at wherever

Andrew's post about start-at-zero helps me refine my own thinking on this evergreen topic.

The specific example he gave is this one:

Andrewgelman_invitezeroin

The dataset is a numeric variable (y) with values over time (x). The minimum numeric value is around 3 and the range of values is from around 3 to just above 20. His advice is "If zero is in the neighborhood, invite it in". (Link)

The rule, as usual, sounds simpler than it really is. In the discussion, Andrew highlights several considerations.

Is zero a meaningful reference value? In his example, we assume it is and so we invite zero in. But, as Andrew also says, if zero is meaningless, then recall the invitation. So context must be accounted for.

In Chapter 1 of Numbersense (link), I looked at some SAT score data of applicants to competitive colleges. Is zero a meaningful reference value for SAT scores? Someone might argue yes, since it is the theoretical minimum score that anyone could get from the test. Any statistician will likely say no, since a competitive college will have never seen an applicant submitting a score of zero, or anywhere close to zero. Thus, starting such a chart at zero inserts a lot of whitespace and draws attention to a useless insight - how far above the theoretical worst performer is someone's score.

***

What about the left panel of Andrew's chart makes us uncomfortable? I ask myself this question. My answer is that the horizontal axis highlights an arbitrary value that distracts from the key patterns of the data.

As shown below, the arbitrary value is ~2.5. This is utterly meaningless.

Redo_andrewgelman_invitezeroin

What if 0 is also a meaningless value for this dataset? I'd recommend "bench the axis". Like this:

Redo_andrewgelman_benchtheaxis

An axis is a tool to help readers understand a chart. If it isn't serving a function, an axis doesn't need to be there. When I choose a line chart for time-series data, I'm drawing attention to temporal change in the numeric values, or the range of values. I'm not saying something about the values relative to some reference number.

From this example, we also see that the horizontal axis should not be regarded as a hanger for time labels. Time labels can exist by themselves.

 

 


Speaking to the choir

A friend found the following chart about the "carbon cycle", and sent me an exasperated note, having given up on figuring it out. The chart came from a report, and was reprinted in Ars Technica (link).

Gcp_s09_2021_global_perturbation-800x371

The problem with the chart is that the designer is speaking to the choir. One must know a lot about the carbon cycle already to make sense of everything that's going on.

We see big and small arrows pointing up or down. Each arrow has a number attached to it, plus a range inside brackets. These numbers have no units, and it's not obvious what they are measuring.

The arrows come in a variety of colors. The colors are explained by labels but the labels dexcribe apparently unrelated concepts (e.g. fossil CO2 and land-use change).

Interspersed with the arrows is a singular dot. The dot also has a number attached to it. The number wears a plus sign, which signals it's being treated differently than the quantities with up arrows.

The singular dot is an outcast, ostracized from the community of dots in the bottom part of the chart. These dots have labels but no numbers. They come in different sizes but no scale is provided.

The background is divided into three parts, showing the atmosphere, the land mass, and the ocean. The placement of the arrows and dots suggests each measured quantity concerns one of these three parts. Well... except the dot labeled "surface sediments" that sit on the boundary of the land mass and the ocean.

The three-way classification is only one layer of the chart. A different classification is embedded in the color scheme. The gray, light green, and aquamarine arrows in the sky find their counterparts in the dots of the land mass, and the ocean.

What's more, the boundaries between land and sky, and between land and ocean are also painted with those colors. These boundary segments have been given different colors so that the lengths of these segments seem to contain data but we aren't sure what.

At this point, I noticed thin arrows which appear to depict back and forth flows. There may be two types of such exchanges, one indicated by a cycle, the other by two straight arrows in opposite directions. The cycles have no numbers while each pair of straight thin arrows gets two numbers, always identical.

At the bottom of the chart is a annotation in red: "Budget imbalance = -1.0". Presumably some formula ties the numbers shown above to this -1.0 result. We still don't know the units, and it's unclear if -1.0 is a bad number. A negative number shown in red typically indicates a bad number but how bad is it?

Finally, on the top right corner, I found a legend. It's not obvious at first because the legend symbols (arrows and dots) are shown in gray, a color not used elsewhere on the chart. It appears as if it represents another color category. The legend labels do little for me. What is an "anthropogenic flux"? What does the unit of "GtCO2" stand for? Other jargon includes "carbon cycling" and "stocks". The entire diagram is titled "carbon cycle" while the "carbon cycling" thin arrows are only a small part of the diagram.

The bottom line is I have no idea what this chart is saying to me, other than that the earth is a complex system, and that the designer has tried valiantly to impregnate the diagram with lots of information. If I am well read in environmental science, my experience is likely different.

 

 

 

 

 


Surging gas prices

A reader finds this chart hard to parse:

Twitter_mta_gasprices

The chart shows the trend in gas prices in New York in the past two years.

This is a case in which the simple line chart works very well.

Junkcharts_redo_mtagasprices

I added annotations as the reasons behind the decline and rise in prices are reasonably clear. 

One should be careful when formatting dates. The legend of the original chart looks like this:

Mta_gasprices_date_legend

In the U.S., dates typically use a M/D/Y format. The above dates are ambiguous. "Aug 19" can be August 19th or August, xx19.


Working hard at clarity

As I am preparing another blog post about the pandemic, I came across the following data graphic, recently produced by the CDC for a vaccine advisory board meeting:

CDC_positivevaccineintent

This is not an example of effective visual communications.

***

For one thing, readers are directed to scour the footnotes to figure out what's going on. If we ignore those for the moment, we see clusters of bubbles that have remained pretty stable from December 2020 to August 2021. The data concern some measure of Americans' intent to take the COVID-19 vaccine. That much we know.

There may have been a bit of an upward trend between January and May, although if you were shown the clusters for December, February and April, you'd think the trend's been pretty flat. 

***

But those colors? What could they represent? You'd surely have to fish this one out of the footnotes. Specifically, this obtuse sentence: "Surveys with multiple time points are shown with the same color bubble for each time point." I had to read it several times. I think it simply means "Color represents the pollster." 

Then it adds: "Surveys with only one time point are shown in gray." which simply means "All pollsters who have only one entry in the dataset are grouped together and shown in gray."

Another problem with this chart is over-plotting. Look at the July cluster. It's impossible to tell how many polls were conducted in July because the circles pile on top of one another. 

***

The appearance of the flat trend is a result of two unfortunate decisions made by the designer. If I retained the chart form, I'd have produced something that looks like this:

Junkcharts_redo_cdcvaccineintent_sameform

The first design choice is to expand the vertical axis to range from 0% to 100%. This effectively squeezes all the bubbles into a small range.

Junkcharts_redo_cdcvaccineintent_startatzero

The second design choice is to enlarge the bubbles causing copious amount of overlapping. 

Junkcharts_redo_cdcvaccineintent_startatzero_bigdots

In particular, this decision blows up the Pew poll (big pink bubble) that contained 10 times the sample size of most of the other polls. The Pew outcome actually came in at 70% but the top of the pink bubble extends to over 80%. Because of this, the outlier poll of December 2020 - which surprisingly printed the highest number of all polls in the entire time window - no longer looks special. 

***

Now, let's see what else we can do to enhance this chart. 

I don't like how bubble size is used to encode the sample size. It creates a weird sensation for anyone who's familiar with sampling errors, and confidence regions. The Pew poll with 10 times the sample size is the most reliable poll of them all. Reliability means the error bars around the Pew poll outcome is the smallest of them all. I tend to think of the area around a point estimate as showing the sampling error so the Pew poll would be a dot, showing the high precision of that estimate. 

But that won't work because larger bubbles catch more of the reader's attention. So, in the following version, all dots have the same size. I encode reliability in the opacity of the color. The darker dots are polls that are more reliable, that have larger sample sizes.

Junkcharts_redo_cdcvaccineintent_opacity

Two of the pollsters have more frequent polling than others. In this next version, I highlighted those two, which reveals the trend better.

Junkcharts_redo_cdcvaccineintent_opacitywithlines

 

 

 


Metaphors, maps, and communicating data

There are some data visualization that are obviously bad. But what makes them bad?

Here is an example of such an effort:

Carbon footprint 2021-02-15_0

This visualization of carbon emissions is not successful. There is precious little that a reader can learn from this chart without expensing a lot of effort. It's relatively easy to identify the largest emitters of carbon but since the data are not expressed per-capita, the chart mainly informs us which countries have the largest populations. 

The color of the bubbles informs readers which countries belong to which parts of the world. However, it distorts the location of countries within regions, and regions relative to regions, as the primary constraint is fitting the bubbles inside the shape of a foot.

The visualization gives a very rough estimate of the relative sizes of total emissions. The circles not being perfect circles don't help. 

It's relatively easy to list the top emitters in each region but it's hard to list the top 10 emitters in the world (try!) 

The small emitters stole all of the attention as they account for most of the labels - and they engender a huge web of guiding lines - an unsightly nuisance.

The diagram clings dearly to the "carbon footprint" metaphor. Does this metaphor help readers consume the emissions data? Conversely, does it slow them down?

A more conventional design uses a cartogram, a type of map in which the positioning of countries are roughly preserved while the geographical areas are coded to the data. Here's how it looks:

Carbonatlasthumb

I can't seem to source this effort. If any reader can find the original source, please comment below.

This cartogram is a rearrangement of the footprint illustration. The map construct eliminates the need to include a color legend which just tells people which country is in which continent. The details of smaller countries are pushed to the bottom. 

In the footprint visualization, I'd even consider getting rid of the legend completely. This means trusting that readers know South Africa is part of Africa, and China is part of Asia.

Carbonfootprint_part

Imagine: what if this chart comes without a color legend? Do we really need it?

***

I'd like to try a word cloud visual for this dataset. Something that looks like this (obviously with the right data encoding):

Michaeltompsett_worldmapwords

(This map is by Michael Tompsett who sells it here.)

 


The time has arrived for cumulative charts

Long-time reader Scott S. asked me about this Washington Post chart that shows the disappearance of pediatric flu deaths in the U.S. this season:

Washingtonpost_pediatricfludeaths

The dataset behind this chart is highly favorable to the designer, because the signal in the data is so strong. This is a good chart. The key point is shown clearly right at the top, with an informative title. Gridlines are very restrained. I'd draw attention to the horizontal axis. The master stroke here is omitting the week labels, which are likely confusing to all but the people familiar with this dataset.

Scott suggested using a line chart. I agree. And especially if we plot cumulative counts, rather than weekly deaths. Here's a quick sketch of such a chart:

Junkcharts_redo_wppedflu_panel

(On second thought, I'd remove the week numbers from the horizontal axis, and just go with the month labels. The Washington Post designer is right in realizing that those week numbers are meaningless to most readers.)

The vaccine trials have brought this cumulative count chart form to the mainstream. For anyone who have seen the vaccine efficacy charts, the interpretation of the panel of line charts should come naturally.

Instead of four plots, I prefer one plot with four superimposed lines. Like this:

Junkcharts_redo_wppeddeaths_superpose2

 

 

 


Dreamy Hawaii

I really enjoyed this visual story by ProPublica and Honolulu Star-Advertiser about the plight of beaches in Hawaii (link).

The story begins with a beautiful invitation:

Propublica_hawaiibeachesfrontimage

This design reminds me of Vimeo's old home page. (It no longer looks like this today but this screenshot came from when I was the data guy there.) In both cases, the images are not static but moving.

Vimeo-homepage

The tour de force of this visual story is an annotated walk along the Lanikai Beach. Here is a snapshot at one of the stops:

Propublica_hawaiibeaches_1368MokuluaDr_small

This shows a particular homeowner who, according to documents, was permitted to rebuild a destroyed seawall even though officials were supposed to disallow reconstruction in order to protect beaches from eroding. The property is marked on the map above. The image inside the box is a gif showing waves smashing the seawall.

As the reader scrolls down, the image window runs through a carousel of gifs of houses along the beach. The images are synchronized to the reader's progress along the shore. The narrative makes stops at specific houses at which point a text box pops up to provide color commentary.

***

The erosion crisis is shown in this pair of maps.

Propublica_hawaiibeaches_oldnewshoreline-sm

There's some fancy work behind the scenes to patch together images, and estimate the boundaries of th beaches.

***

The following map is notable for its simplicity. There are no unnecessary details and labels. We don't need to know the name of every street or a specific restaurant. Removing excess details makes readers focus on the informative parts. 

Propublica_hawaiibeaches_simplemap-sm

Clicking on the dots brings up more details.

***

Enjoy the entire story here.