Pies, bars and self-sufficiency

Andy Cotgreave asked Twitter followers to pick between pie charts and bar charts:

Ac_pie_or_bar

The underlying data are proportions of people who say they won't get the coronavirus vaccine.

I noticed two somewhat unusual features: the use of pies to show single proportions, and the aspect ratio of the bars (taller than typical). Which version is easier to understand?

To answer this question, I like to apply a self-sufficiency test. This test is used to determine whether the readers are using the visual elements of the chart to udnerstand the data, or are they bypassing the visual elements and just reading the data labels? So, let's remove the printed data from the chart and take another look:

Junkcharts_selfsufficiency_pieorbar

For me, these charts are comparable. Each is moderately hard to read. That's because the percentages fall into a narrow range at one end of the range. For both charts, many readers are likely to be looking for the data labels.

Here's a sketch of a design that is self-sufficient.

Junkcharts_selfsufficientdesign

The data do not appear on this chart.

***

My first reaction to Andy's tweet turned out to be a misreading of the charts. I thought he was disaggregating the pie chart, like we can unstack a stacked bar chart.

Junkcharts_probabilities_proportions

Looking at the data more carefully, I realize that the "proportions" are not part to the whole. Or rather, the whole isn't "all races" or "all education levels". The whole is all respondents of a particular type.

 

 


Re-engineering #onelesspie

Marco tweeted the following pie chart to me (tip from Danilo), which is perfect since today is Pi Day, and I have to do my #onelesspie duty. This started a few years ago with Xan Gregg.

Onelesspie2021

This chart supposedly was published in an engineering journal. I don't have a clue what the question might be that this chart is purportedly answering. Maybe the reason for picking a cellphone?

The particular bits that make this chart hard to comprehend are these:

Junkcharts_onelesspie2021_problems

The chart also fails the ordering rule, as it spreads the largest pieces around.

It doesn't have to be so complicated.

Here is a primitive chart that doesn't even require a graphics software.

Junkcharts_redo_onelesspie2021_1color

Younger readers have not experienced the days (pre 2000) when color printing was at a premium, and most graphics were grayscale. Nevertheless, restrained use of color is recommended.

Junkcharts_redo_onelesspie2021_2colors

Happy Pi Day!


These are the top posts of 2020

It's always very interesting as a writer to look back at a year's of posts and find out which ones were most popular with my readers.

Here are the top posts on Junk Charts from 2020:

How to read this chart about coronavirus risk

This post about a New York Times scatter plot dates from February, a time when many Americans were debating whether Covid-19 was just the flu.

Proportions and rates: we are no dupes

This post about a ArsTechnica chart on the effects of Covid-19 by age is an example of designing the visual to reflect the structure of the data.

When the pie chart is more complex than the data

This post shows a 3D pie chart which is worse than a 2D pie chart.

Twitter people upset with that Covid symptoms diagram

This post discusses some complicated graphics designed to illustrate complicated datasets on Covid-19 symptoms.

Cornell must remove the logs before it reopens in the fall

This post is another warning to think twice before you use log scales.

What is the price of objectivity?

This post turns an "objective" data visualization into a piece of visual story-telling.

The snake pit chart is the best election graphic ever

This post introduces my favorite U.S. presidential election graphic, designed by the FiveThirtyEight team.

***

Here is a list of posts that deserve more attention:

Locating the political center

An example of bringing readers as close to the insights as possible

Visualizing change over time

An example of designing data visualization to reflect the structure of multivariate data

Bloomberg made me digest these graphics slowly

An example of simple and thoughtful graphics

The hidden bad assumption behind most dual-axis time-series charts

Read this before you make a dual-axis chart

Pie chart conventions

Read this before you make a pie chart

***
Looking forward to bring you more content in 2021!

Happy new year.


Using comparison to enrich a visual story

Just found this beauty deep in my submission pile (from Howie H.):

Iwillvote_texas

What's great about this pie chart is the story it's trying to tell. Almost half of the electorate did not vote in Texas in the 2016 Presidential election. The designer successfully draws my attention to the white sector that makes the point.

There are a few problems.

Showing two decimals is too much precision.

The purple sector is not labeled.

The white area seems exaggerated. The four sectors do not appear to meet at the center of the circle. The distortion is not too much but it's schizophrenic: the pie slices are drawn with low precision while the data labels have high precision.

***

The following fixes those problems, and also adds a second chart to contrast the two ways of thinking:

Redo_junkcharts_iwillvotecomtexas


Avoid concentric circles

A twitter follower sent me this chart by way of Munich:

Msc_staggereddonut

The logo of the Munich Security Conference (MSC) is quite cute. It looks like an ear. Perhaps that inspired this, em, staggered donut chart.

I like to straighten curves out so the donut chart becomes a bar chart:

Redo_junkcharts_msc_germanallies_distortion

The blue and gray bars mimic the lengths of the arcs in the donut chart. The yellow bars show the relative size of the underlying data. You can see that three of the four arcs under-represent the size of the data.

Why is that so? It's due to the staggering. Inner circles have smaller circumferences than outer circles. The designer keeps the angles the same so the arc lengths have been artificially reduced.

Junkcharts_redo_munichgermanallies_donuts

***

The donut chart is just a pie chart with a hole punched in the middle. For both pie charts and donut charts, the data are encoded in the angles at the center of the circle. Under normal circumstances, pie charts can also be read by comparing sector areas, and donut charts using arc lengths, as those are proportional to the angles.

The area and arc interpretation fails when the designer alters the radii of the sections. Look at the following pair of pie charts, produced by filling the hole in the above donuts:

Junkcharts_redo_munichgermanallies_pies

The staggered pie chart distorts the data if the reader compares areas but not so if the reader compares angles at the center. The pie chart can be read both ways so long as the designer does not alter the radii.

 


Bloomberg made me digest these graphics slowly

Ask the experts to name the success metric of good data visualization, and you will receive a dozen answers. The field doesn't have an all-encompassing metric. A useful reference is Andrew Gelman and Antony Urwin (2012) in which they discussed the tradeoff between beautiful and informative, which derives from the familiar tension between art and science.

For a while now, I've been intrigued by metrics that measure "effort". Some years ago, I described the concept of a "return on effort" in this post. Such a metric can be constructed like the dominating financial metric of return on investment. The investment here is an investment of time, of attention. I strongly believe that if the consumer judges a data visualization to be compelling, engaging or  ell constructed, s/he will expend energy to devour it.

Imagine grub you discard after the first bite, compared to the delicious food experienced slowly, savoring every last bit.

Bloomberg_ambridge_smI'm writing this post while enjoying the September issue of Bloomberg Businessweek, which focuses on the upcoming U.S. Presidential election. There are various graphics infused into the pages of the magazine. Many of these graphics operate at a level of complexity above what typically show up in magazines, and yet I spent energy learning to understand them. This response, I believe, is what visual designers should aim for.

***

Today, I discuss one example of these graphics, shown on the right. You might be shocked by the throwback style of these graphics. They look like they arrived from decades ago!

Grayscale, simple forms, typewriter font, all caps. Have I gone crazy?

The article argues that a town like Ambridge in Beaver County, Pennslyvania may be pivotal in the November election. The set of graphics provides relevant data to understand this argument.

It's evidence that data visualization does not need whiz-bang modern wizardry to excel.

Let me focus on the boxy charts from the top of the column. These:

Bloomberg_ambridge_topboxes

These charts solve a headache with voting margin data in the U.S.  We have two dominant political parties so in any given election, the vote share data split into three buckets: Democratic, Republican, and a catch-all category that includes third parties, write-ins, and none of the above. The third category rarely exceeds 5 percent.  A generic pie chart representation looks like this:

Redo_junkcharts_bloombergambridgebox_pies

Stacked bars have this look:

Redo_junkcharts_bloombergambridgebox_bars

In using my Trifecta framework (link), the top point is articulating the question. The primary issue here is the voting margin between the winner and the second-runner-up, which is the loser in what is typically a two-horse race. There exist two sub-questions: the vote-share difference between the top two finishers, and the share of vote effectively removed from the pot by the remaining candidates.

Now, take another look at the unusual chart form used by Bloomberg:

Bloomberg_ambridge_topboxes1

The catch-all vote share sits at the bottom while the two major parties split up the top section. This design demonstrates a keen understanding of the context. Consider the typical outcome, in which the top two finishers are from the two major parties. When answering the first sub-question, we can choose the raw vote shares, or the normalized vote shares. Normalizing shifts the base from all candidates to the top two candidates.

The Bloomberg chart addresses both scales. The normalized vote shares can be read directly by focusing only on the top section. In an even two-horse race, the top section is split by half - this holds true regardless of the size of the bottom section.

This is a simple chart that packs a punch.

 


Making better pie charts if you must

I saw this chart on an NYU marketing twitter account:

LATAMstartupCEO_covidimpact

The graphical design is not easy on our eyes. It's just hard to read for various reasons.

The headline sounds like a subject line from an email.

The subheaders are long, and differ only by a single word.

Even if one prefers pie charts, they can be improved by following a few guidelines.

First, start the first sector at the 12-oclock direction. Like this:

Redo_junkcharts_latamceo_orientation

The survey uses a 5-point scale from "Very Good" to "Very Bad". Instead of using five different colors, it's better to use two extreme colors and shading. Like this:

Redo_junkcharts_latamceo_color

I also try hard to keep all text horizontal.

Redo_junkcharts_latamceo_labels

For those who prefers not to use pie charts, a side-by-side bar chart works well.

Redo_junkcharts_latamceo_bars

In my article for DataJournalism.com, I outlined "unspoken rules" for making various charts, including pie charts.

 

 

 


When the pie chart is more complex than the data

The trading house, Charles Schwab, included the following graphic in a recent article:

Charleschwab_portfolio_1000

This graphic is more complicated than the story that it illustrates. The author describes a simple scenario in which an investor divides his investments into stocks, bonds and cash. After a stock crash, the value of the portfolio declines.

The graphic is a 3-D pie chart, in which the data are encoded twice, first in the areas of the sectors and then in the heights of the part-cylinders.

As readers, we perceive the relative volumes of the part-cylinders. Volume is the cross-sectional area (i.e. of the base) multipled by the height. Since each component holds the data, the volumes are proportional to the squares of the data.

Here is a different view of the same data:

Redo_junkcharts_schwab_portfolio

This "bumps chart" (also called a slopegraph) shows clearly the only thing that drives the change is the drop in stock prices. Because the author assumes no change in bonds or cash, the drop in the entire portfolio is completely accounted for by the decline in stocks. Of course, this scenario seems patently unrealistic - different investment asset classes tend to be correlated.

***

A cardinal rule of data visualization is that the visual should be less complex than the data.


Make your color legend better with one simple rule

The pie chart about COVID-19 worries illustrates why we should follow a basic rule of constructing color legends: order the categories in the way you expect readers to encounter them.

Here is the chart that I discussed the other day, with the data removed since they are not of concern in this post. (link)

Junkcharts_abccovidbiggestworries_sufficiency

First look at the pie chart. Like me, you probably looked at the orange or the yellow slice first, then we move clockwise around the pie.

Notice that the legend leads with the red square ("Getting It"), which is likely the last item you'll see on the chart.

This is the same chart with the legend re-ordered:

Redo_junkcharts_abcbiggestcovidworries_legend

***

Simple charts can be made better if we follow basic rules of construction. When used frequently, these rules can be made silent. I cover rules for legends as well as many other rules in this Long Read article titled "The Unspoken Conventions of Data Visualization" (link).


When the visual runs away from the data

The pressure of the coronavirus news cycle has gotten the better of some graphics designers. Via Twitter, Mark B sent me the following chart:

Junkcharts_abccovidbiggestworries_sufficiency

I applied the self-sufficiency test to this pie chart. That's why you can't see the data which were also printed on the chart.

The idea of self-sufficiency is to test how much work the visual elements of the graphic are doing to convey its message. Look at the above chart, and guess the three values are.

Roughly speaking, all three answers are equally popular, with perhaps a little less than a third of respondents indicating "Getting It" as their biggest COVID-19 worry.

If measured, the slices represent 38%, 35% and 27%.

Now, here is the same chart with the data:

Abc_covidbiggestworries

Each number is way off! In addition, the three numbers sum to 178%.

Trifectacheckup_junkcharts_imageThis is an example of the Visual being at odds with the Data, using a Trifecta Checkup analysis. (Read about the Trifecta here.)

What the Visual is saying is not the same as what the data are saying. So the green arrow between D and V is broken.

***

This is a rather common mistake. This survey question apparently allows each respondent to select more than one answers. Whenever more than one responses are accepted, one cannot use a pie chart.

Here is a stacked bar chart that does right by the data.

Redo_junkcharts_abcbiggestcovidworries