What is a bad chart?

In the recent issue of Madolyn Smith’s Conversations with Data newsletter hosted by DataJournalism.com, she discusses “bad charts,” featuring submissions from several dataviz bloggers, including myself.

What is a “bad chart”? Based on this collection of curated "bad charts", it is not easy to nail down “bad-ness”. The common theme is the mismatch between the message intended by the designer and the message received by the reader, a classic error of communication. How such mismatch arises depends on the specific example. I am able to divide the “bad charts” into two groups: charts that are misinterpreted, and charts that are misleading.

 

Charts that are misinterpreted

The Causes of Death entry, submitted by Alberto Cairo, is a “well-designed” chart that requires “reading the story where it is inserted and the numerous caveats.” So readers may misinterpret the chart if they do not also partake the story at Our World in Data which runs over 1,500 words not including the appendix.

Ourworldindata_causesofdeath

The map of Canada, submitted by Highsoft, highlights in green the provinces where the majority of residents are members of the First Nations. The “bad” is that readers may incorrectly “infer that a sizable part of the Canadian population is First Nations.”

Highsoft_CanadaFirstNations

In these two examples, the graphic is considered adequate and yet the reader fails to glean the message intended by the designer.

 

Charts that are misleading

Two fellow bloggers, Cole Knaflic and Jon Schwabish, offer the advice to start bars at zero (here's my take on this rule). The “bad” is the distortion introduced when encoding the data into the visual elements.

The Color-blindness pictogram, submitted by Severino Ribecca, commits a similar faux pas. To compare the rates among men and women, the pictograms should use the same baseline.

Colourblindness_pictogram

In these examples, readers who correctly read the charts nonetheless leave with the wrong message. (We assume the designer does not intend to distort the data.) The readers misinterpret the data without misinterpreting the graphics.

 

Using the Trifecta Checkup

In the Trifecta Checkup framework, these problems are second-level problems, represented by the green arrows linking up the three corners. (Click here to learn more about using the Trifecta Checkup.)

Trifectacheckup_img

The visual design of the Causes of Death chart is not under question, and the intended message of the author is clearly articulated in the text. Our concern is that the reader must go outside the graphic to learn the full message. This suggests a problem related to the syncing between the visual design and the message (the QV edge).

By contrast, in the Color Blindness graphic, the data are not under question, nor is the use of pictograms. Our concern is how the data got turned into figurines. This suggests a problem related to the syncing between the data and the visual (the DV edge).

***

When you complain about a misleading chart, or a chart being misinterpreted, what do you really mean? Is it a visual design problem? a data problem? Or is it a syncing problem between two components?


Inspiration from a waterfall of pie charts: illustrating hierarchies

Reader Antonio R. forwarded a tweet about the following "waterfall of pie charts" to me:

Water-stats-pie21

Maarten Lamberts loved these charts (source: here).

I am immediately attracted to the visual thinking behind this chart. The data are presented in a hierarchy with three levels. The levels are nested in the sense that the pieces in each pie chart add up to 100%. From the first level to the second, the category of freshwater is sub-divided into three parts. From the second level to the third, the "others" subgroup under freshwater is sub-divided into five further categories.

The designer faces a twofold challenge: presenting the proportions at each level, and integrating the three levels into one graphic. The second challenge is harder to master.

The solution here is quite ingenious. A waterfall/waterdrop metaphor is used to link each layer to the one below. It visually conveys the hierarchical structure.

***

There remains a little problem. There is a confusion related to the part and the whole. The link between levels should be that one part of the upper level becomes the whole of the lower level. Because of the color scheme, it appears that the part above does not account for the entirety of the pie below. For example, water in lakes is plotted on both the second and third layers while water in soil suddenly enters the diagram at the third level even though it should be part of the "drop" from the second layer.

***

I started playing around with various related forms. I like the concept of linking the layers and want to retain it. Here is one graphic inspired by the waterfall pies from above:

Redo_waterfall_pies

 


A second take on the rural-urban election chart

Yesterday, I looked at the following pictograms used by Business Insider in an article about the rural-urban divide in American politics:

Businessinsider_ruraldistricts

The layout of this diagram suggests that the comparison of 2010 to 2018 is a key purpose.

The following alternate directly plots the change between 2010 and 2018, reducing the number of plots from 4 to 2.

Redo_jc_businessinsider_ruraldistricts_2

The 2018 results are emphasized. Then, for each party, there can be a net add or loss of seats.

The key trends are:

  • a net loss in seats in "Pure rural" districts, split by party;
  • a net gain of 3 seats in "rural-suburban" districts;
  • a loss of 10 Democratic seats balanced by a gain of 13 Republican seats.

 


Another experiment with enhanced pictogram

In a previous post, I experimented with an idea around enhancing pictograms. These are extremely popular charts used to show countable objects. I found another example in Business Insider's analysis of the mid-term election results. Here is an excerpt of a pair of pictograms that show the relative performance of Republicans and Democrats in districts that are classified as "Pure Rural" or "Rural-Suburban":

Businessinsider_ruraldistricts

(Note that there is an error in the bottom left chart. There should be 24 blue squares not 34! In the remainder of the post, I will retain this error so that the revisions are comparable to the original.)

There are quite a few dimensions going on in this deceptively simple chart. There is the red domination of these rural districts to the tune of 75 to 80% share. There is the further weakening of Democrats from 2010 to 2018.  There is a shift of seats out of pure rural areas (- 13) and into rural-suburban (+14) from 2010 to 2018.

Anyone who learn of the above trends probably did so by reading off the data tables on the sides. It's a given that those tables, or simple bar charts can be more effective with this dataset.

What I like to explore is the pictogram, assuming that we are required to use a pictogram. Can the pictogram be enhanced to overcome some of its weaknesses?

The defining characteristic of the pictogram is the presence of individual units, which means the reader can count the units. This feature is also its downfall. In most pictograms, it is a bear to count the units. Try counting out the blue and red squares in the above image - and don't cheat by staring at the data tables!

My goal is to enhance the pictogram by making it easier for readers to count the units. The strategy is to place cues so that the units can be counted in larger groups like 5 or 10. Also, when possible, exploit symmetry.

Here is an example:

Redo_businessinsider_rural_districts

The squares are arranged to facilitate comparing the 2010 and 2018 numbers. So for rural-suburban, there were 10 fewer blue squares and +10+3 = +13 red squares.

This post to be continued in the next post ....