Using comparison to enrich a visual story

Just found this beauty deep in my submission pile (from Howie H.):

Iwillvote_texas

What's great about this pie chart is the story it's trying to tell. Almost half of the electorate did not vote in Texas in the 2016 Presidential election. The designer successfully draws my attention to the white sector that makes the point.

There are a few problems.

Showing two decimals is too much precision.

The purple sector is not labeled.

The white area seems exaggerated. The four sectors do not appear to meet at the center of the circle. The distortion is not too much but it's schizophrenic: the pie slices are drawn with low precision while the data labels have high precision.

***

The following fixes those problems, and also adds a second chart to contrast the two ways of thinking:

Redo_junkcharts_iwillvotecomtexas


Avoid concentric circles

A twitter follower sent me this chart by way of Munich:

Msc_staggereddonut

The logo of the Munich Security Conference (MSC) is quite cute. It looks like an ear. Perhaps that inspired this, em, staggered donut chart.

I like to straighten curves out so the donut chart becomes a bar chart:

Redo_junkcharts_msc_germanallies_distortion

The blue and gray bars mimic the lengths of the arcs in the donut chart. The yellow bars show the relative size of the underlying data. You can see that three of the four arcs under-represent the size of the data.

Why is that so? It's due to the staggering. Inner circles have smaller circumferences than outer circles. The designer keeps the angles the same so the arc lengths have been artificially reduced.

Junkcharts_redo_munichgermanallies_donuts

***

The donut chart is just a pie chart with a hole punched in the middle. For both pie charts and donut charts, the data are encoded in the angles at the center of the circle. Under normal circumstances, pie charts can also be read by comparing sector areas, and donut charts using arc lengths, as those are proportional to the angles.

The area and arc interpretation fails when the designer alters the radii of the sections. Look at the following pair of pie charts, produced by filling the hole in the above donuts:

Junkcharts_redo_munichgermanallies_pies

The staggered pie chart distorts the data if the reader compares areas but not so if the reader compares angles at the center. The pie chart can be read both ways so long as the designer does not alter the radii.

 


Bloomberg made me digest these graphics slowly

Ask the experts to name the success metric of good data visualization, and you will receive a dozen answers. The field doesn't have an all-encompassing metric. A useful reference is Andrew Gelman and Antony Urwin (2012) in which they discussed the tradeoff between beautiful and informative, which derives from the familiar tension between art and science.

For a while now, I've been intrigued by metrics that measure "effort". Some years ago, I described the concept of a "return on effort" in this post. Such a metric can be constructed like the dominating financial metric of return on investment. The investment here is an investment of time, of attention. I strongly believe that if the consumer judges a data visualization to be compelling, engaging or  ell constructed, s/he will expend energy to devour it.

Imagine grub you discard after the first bite, compared to the delicious food experienced slowly, savoring every last bit.

Bloomberg_ambridge_smI'm writing this post while enjoying the September issue of Bloomberg Businessweek, which focuses on the upcoming U.S. Presidential election. There are various graphics infused into the pages of the magazine. Many of these graphics operate at a level of complexity above what typically show up in magazines, and yet I spent energy learning to understand them. This response, I believe, is what visual designers should aim for.

***

Today, I discuss one example of these graphics, shown on the right. You might be shocked by the throwback style of these graphics. They look like they arrived from decades ago!

Grayscale, simple forms, typewriter font, all caps. Have I gone crazy?

The article argues that a town like Ambridge in Beaver County, Pennslyvania may be pivotal in the November election. The set of graphics provides relevant data to understand this argument.

It's evidence that data visualization does not need whiz-bang modern wizardry to excel.

Let me focus on the boxy charts from the top of the column. These:

Bloomberg_ambridge_topboxes

These charts solve a headache with voting margin data in the U.S.  We have two dominant political parties so in any given election, the vote share data split into three buckets: Democratic, Republican, and a catch-all category that includes third parties, write-ins, and none of the above. The third category rarely exceeds 5 percent.  A generic pie chart representation looks like this:

Redo_junkcharts_bloombergambridgebox_pies

Stacked bars have this look:

Redo_junkcharts_bloombergambridgebox_bars

In using my Trifecta framework (link), the top point is articulating the question. The primary issue here is the voting margin between the winner and the second-runner-up, which is the loser in what is typically a two-horse race. There exist two sub-questions: the vote-share difference between the top two finishers, and the share of vote effectively removed from the pot by the remaining candidates.

Now, take another look at the unusual chart form used by Bloomberg:

Bloomberg_ambridge_topboxes1

The catch-all vote share sits at the bottom while the two major parties split up the top section. This design demonstrates a keen understanding of the context. Consider the typical outcome, in which the top two finishers are from the two major parties. When answering the first sub-question, we can choose the raw vote shares, or the normalized vote shares. Normalizing shifts the base from all candidates to the top two candidates.

The Bloomberg chart addresses both scales. The normalized vote shares can be read directly by focusing only on the top section. In an even two-horse race, the top section is split by half - this holds true regardless of the size of the bottom section.

This is a simple chart that packs a punch.

 


Making better pie charts if you must

I saw this chart on an NYU marketing twitter account:

LATAMstartupCEO_covidimpact

The graphical design is not easy on our eyes. It's just hard to read for various reasons.

The headline sounds like a subject line from an email.

The subheaders are long, and differ only by a single word.

Even if one prefers pie charts, they can be improved by following a few guidelines.

First, start the first sector at the 12-oclock direction. Like this:

Redo_junkcharts_latamceo_orientation

The survey uses a 5-point scale from "Very Good" to "Very Bad". Instead of using five different colors, it's better to use two extreme colors and shading. Like this:

Redo_junkcharts_latamceo_color

I also try hard to keep all text horizontal.

Redo_junkcharts_latamceo_labels

For those who prefers not to use pie charts, a side-by-side bar chart works well.

Redo_junkcharts_latamceo_bars

In my article for DataJournalism.com, I outlined "unspoken rules" for making various charts, including pie charts.

 

 

 


When the pie chart is more complex than the data

The trading house, Charles Schwab, included the following graphic in a recent article:

Charleschwab_portfolio_1000

This graphic is more complicated than the story that it illustrates. The author describes a simple scenario in which an investor divides his investments into stocks, bonds and cash. After a stock crash, the value of the portfolio declines.

The graphic is a 3-D pie chart, in which the data are encoded twice, first in the areas of the sectors and then in the heights of the part-cylinders.

As readers, we perceive the relative volumes of the part-cylinders. Volume is the cross-sectional area (i.e. of the base) multipled by the height. Since each component holds the data, the volumes are proportional to the squares of the data.

Here is a different view of the same data:

Redo_junkcharts_schwab_portfolio

This "bumps chart" (also called a slopegraph) shows clearly the only thing that drives the change is the drop in stock prices. Because the author assumes no change in bonds or cash, the drop in the entire portfolio is completely accounted for by the decline in stocks. Of course, this scenario seems patently unrealistic - different investment asset classes tend to be correlated.

***

A cardinal rule of data visualization is that the visual should be less complex than the data.


Make your color legend better with one simple rule

The pie chart about COVID-19 worries illustrates why we should follow a basic rule of constructing color legends: order the categories in the way you expect readers to encounter them.

Here is the chart that I discussed the other day, with the data removed since they are not of concern in this post. (link)

Junkcharts_abccovidbiggestworries_sufficiency

First look at the pie chart. Like me, you probably looked at the orange or the yellow slice first, then we move clockwise around the pie.

Notice that the legend leads with the red square ("Getting It"), which is likely the last item you'll see on the chart.

This is the same chart with the legend re-ordered:

Redo_junkcharts_abcbiggestcovidworries_legend

***

Simple charts can be made better if we follow basic rules of construction. When used frequently, these rules can be made silent. I cover rules for legends as well as many other rules in this Long Read article titled "The Unspoken Conventions of Data Visualization" (link).


When the visual runs away from the data

The pressure of the coronavirus news cycle has gotten the better of some graphics designers. Via Twitter, Mark B sent me the following chart:

Junkcharts_abccovidbiggestworries_sufficiency

I applied the self-sufficiency test to this pie chart. That's why you can't see the data which were also printed on the chart.

The idea of self-sufficiency is to test how much work the visual elements of the graphic are doing to convey its message. Look at the above chart, and guess the three values are.

Roughly speaking, all three answers are equally popular, with perhaps a little less than a third of respondents indicating "Getting It" as their biggest COVID-19 worry.

If measured, the slices represent 38%, 35% and 27%.

Now, here is the same chart with the data:

Abc_covidbiggestworries

Each number is way off! In addition, the three numbers sum to 178%.

Trifectacheckup_junkcharts_imageThis is an example of the Visual being at odds with the Data, using a Trifecta Checkup analysis. (Read about the Trifecta here.)

What the Visual is saying is not the same as what the data are saying. So the green arrow between D and V is broken.

***

This is a rather common mistake. This survey question apparently allows each respondent to select more than one answers. Whenever more than one responses are accepted, one cannot use a pie chart.

Here is a stacked bar chart that does right by the data.

Redo_junkcharts_abcbiggestcovidworries

 


Pie chart conventions

I came across this pie chart from a presentation at an industry meeting some weeks ago:

Mediaconversations_orig

This example breaks a number of the unspoken conventions on making pie charts and so it is harder to read than usual.

Notice that the biggest slice starts around 8 o'clock, and the slices are ordered alphabetically by the label, rather than numerically by size of the slice.

The following is the same chart ordered in a more conventional way. The largest slice is placed along the top vertical, and the other slices are arranged in a clock-wise manner from larger to smaller.

Redo_junkcharts_mediaconversations1

This version is easier to read because the reader does not need to think about the order of the slices. The expectation of decreasing size is met.

The above pie chart, though, reveals breaking of another convention. The colors on this chart signify nothing! The general rule is color differences should encode data differences. Here, the colors should go from deepest to lightest. (One can even argue that different tinges is redundant.)

Redo_junkcharts_mediaconversations2

You see how this version is even better. In the previous version, the colors are distracting. You're wondering what they mean, and then you realize they signify nothing.

***

As designers of graphics, we follow a bunch of conventions silently. When a design deviates from it, it's harder to understand.

Recently, I wrote a long article for DataJournalism.com, setting out many of these unspoken conventions. Read it here.

 


The unspoken rules of visualization

My latest is at DataJournalism.com.

Ejc_unspokenrulesbanner

It's an essay on the following observation:

The efficiency and multidimensionality of the visual medium arise from a set of conventions and rules, which regularises the communications between producers of data visualisation and its consumers. These conventions and rules are often unspoken: it's the visual equivalent of saying ’it goes without saying’ .

There are lots of little things visualization designers do in their sleep that don't get mentioned. When a visual design deviates from these rules, the readers may get confused.

Here is one example I discussed in the article (hat tip to Xan Gregg).

Fig04_piechart_diverging

This pie chart is not easy to read beyond the obvious point that English is the most popular. The following pie chart is much easier on the readers:

Fig03_piechart_conforming

Why?

The designer follows some common conventions, such as placing the first slice at the top vertical, sorting the slices from largest to smallest (excepting the "other"), and introducing multiple colors only to encode data differences.

These rules are silently applied, and are not announced to the reader. There is a network effect: the more practitioners use these rules, the stronger they stick.

My essay attempts to outline some of the most important unspoken rules of visualizaiton. For more, see here.


Tightening the bond between the message and the visual: hello stats-cats

The editors of ASA's Amstat News certainly got my attention, in a recent article on school counselling. A research team asked two questions. The first was HOW ARE YOU FELINE?

Stats and cats. The pun got my attention and presumably also made others stop and wonder. The second question was HOW DO YOU REMEMBER FEELING while you were taking a college statistics course? Well, it's hard to imagine the average response to that question would be positive.

What also drew me to the article was this pair of charts:

Counselors_Figure1small

Surely, ASA can do better. (I'm happy to volunteer my time!)

Rotate the chart, clean up the colors, remove the decimals, put the chart titles up top, etc.

***

The above remedies fall into the V corner of my Trifecta checkup.

Trifectacheckup_junkcharts_imageThe key to fixing this chart is to tighten the bond between the message and the visual. This means working that green link between the Q and V corners.

This much became clear after reading the article. The following paragraphs are central to the research (bolding is mine):

Responses indicated the majority of school counselors recalled experiences of studying statistics in college that they described with words associated with more unpleasant affect (i.e., alarm, anger, distress, fear, misery, gloom, depression, sadness, and tiredness; n = 93; 66%). By contrast, a majority of counselors reported same-day (i.e., current) emotions that appeared to be associated with more pleasant affect (i.e., pleasure, happiness, excitement, astonishment, sleepiness, satisfaction, and calm; n = 123; 88%).

Both recalled emotive experiences and current emotional states appeared approximately balanced on dimensions of arousal: recalled experiences associated with lower arousal (i.e., pleasure, misery, gloom, depression, sadness, tiredness, sleepiness, satisfaction, and calm, n = 65, 46%); recalled experiences associated with higher arousal (i.e., happiness, excitement, astonishment, alarm, anger, distress, fear, n = 70, 50%); current emotions associated with lower arousal (n = 60, 43%); current experiences associated with higher arousal (i.e., n = 79, 56%).

These paragraphs convey two crucial pieces of information: the structure of the analysis, and its insights.

The two survey questions measure two states of experiences, described as current versus recalled. Then the individual affects (of which there were 16 plus an option of "other") are scored on two dimensions, pleasure and arousal. Each affect maps to high or low pleasure, and separately to high or low arousal.

The research insight is that current experience was noticably higher than recalled experience on the pleasure dimension but both experiences were similar on the arousal dimension.

Any visualization of this research must bring out this insight.

***

Here is an attempt to illustrate those paragraphs:

Redo_junkcharts_amstat_feline

The primary conclusion can be read from the four simple pie charts in the middle of the page. The color scheme shines light on which affects are coded as high or low for each dimension. For example, "distressed" is scored as showing low pleasure and high arousal.

A successful data visualization for this situation has to bring out the conclusion drawn at the aggregated level, while explaining the connection between individual affects and their aggregates.