Beauty is in the eyes of the fishes

Reader Patrick S. sent in this old gem from Germany.

Swimmingpoolsvisitors_ger

He said:

It displays the change in numbers of visitors to public pools in the German city of Hanover. The invisible y-axis seems to be, um, nonlinear, but at least it's monotonic, in contrast to the invisible x-axis.

There's a nice touch, though: The eyes of the fish are pie charts. Black: outdoor pools, white: indoor pools (as explained in the bottom left corner).

It's taken from a 1960 publication of the city of Hanover called *Hannover: Die Stadt in der wir leben*.

This is the kind of chart that Ed Tufte made (in)famous. The visual elements do not serve the data at all, except for the eyeballs. The design becomes a mere vessel for the data table. The reader who wants to know the growth rate of swimmers has to do a tank of work.

The eyeballs though.

I like the fact that these pie charts do not come with data labels. This part of the chart passes the self-sufficiency test. In fact, the eyeballs contain the most interesting story in this chart. In those four years, the visitors to public pools switched from mostly indoor pools to mostly outdoor pools. These eyeballs show that pie charts can be effective in specific situations.

Now, Hanover fishes are quite lucky to have free admission to the public pools!


Doing my duty on Pi Day #onelesspie

Xan Gregg and I started a #onelesspie campaign a few years ago. On Pi Day each year, we find a pie chart, and remake it. On Wikipedia, you can find all manners of pie chart. Try this search, and see for yourself.

Here's one found on the Wiki page about the city of Ogema, in Canada:

Ogema_Stats_canada_pie_chart

This chart has 20 age groups, each given a different color. That's way too much!

I was able to find data on 10-year age groups, not five. But the "shape" of the distribution is much easily seen on a column chart (a histogram).

Redo_ogema_age_distribution

Only a single color is needed.

The reason why I gravitated to this chart was the highly unusual age distribution... this town has almost uniform distribution of age groups, with each of the 10-year ranges accounting for about 11% of the population. Given that there are 9 groups, a perfectly even distribution would be 11% for each column. (Well, the last group of 80+ is cheating a bit as it has more than 10 years.)

I don't know about Ogema. Maybe a reader can explain this unusual age distribution!

 

 

 


Steel tariffs, and my new dataviz seminar

I am developing a new seminar aimed at business professionals who want to improve their ability to communicate using charts. I want any guidance to be tool-agnostic, so that attendees can implement them using Excel if that’s their main charting software. Over the 12+ years that I’ve been blogging, certain ideas keep popping up; and I have collected these motifs and organized them for the seminar. This post is about a recent chart that brings up a few of these motifs.

This chart has been making the rounds in articles about the steel tariffs.

2018.03.08steel_1

The chart shows the Top 10 nations that sell steel to the U.S., which together account for 78% of all imports. 

The chart shows a few signs of design. These things caught my eye:

  1. the pie chart on the left delivers the top-line message that 10 countries account for almost 80% of all U.S. steel imports
  2. the callout gives further information about which 10 countries and how much each nation sells to the U.S. This is a nice use of layering
  3. on the right side, progressive tints of blue indicate the respective volumes of imports

On the negative side of the ledger, the chart is marred by three small problems. Each of these problems concerns inconsistency, which creates confusion for readers.

  1. Inconsistent use of color: on the left side, the darker blue indicates lower volume while on the right side, the darker blue indicates higher volume
  2. Inconsistent coding of pie slices: on the right side, the percentages add up to 78% while the total area of the pie is 100%
  3. Inconsistent scales: the left chart carrying the top-line message is notably smaller than the right chart depicting the secondary message. Readers’ first impression is drawn to the right chart.

Easy fixes lead to the following chart:

Redo_steelimports_1

***

The central idea of the new dataviz seminar is that there are many easy fixes that are often missed by the vast majority of people making Excel charts. I will present a stack of these motifs. If you're in the St. Louis area, you get to experience the seminar first. Register for a spot here.

Send this message to your friends and coworkers in the area. Also, contact me if you'd like to bring this seminar to your area.

***

I also tried the following design, which brings out some other interesting tidbits, such as that Canada and Brazil together sell the U.S. about 30% of its imported steel, the top 4 importers account for about 50% of all steel imports, etc. Color is introduced on the chart via a stylized flag coloring.

Redo_steelimports_2

 

 

 

 

 


Three pies and a bar: serving visual goodness

If you are not sick of the Washington Post article about friends (not) letting friends join the other party, allow me to write yet another post on, gasp, that pie chart. And sorry to have kept reader Daniel L. waiting, as he pointed out, when submitting this chart to me, that he had tremendous difficulty understanding it:

Wpost_friendsparties4

 

This is not one pie but six pies on a platter. There are two sources of confusion: first, the repeated labels of Republicans and Democrats to refer to different groups of people; and second, the indecision between using two or four categories of "how many".

Let me begin by re-ordering and re-labeling the chart:

Redo_junkcharts_friendsparties4

From this version, one can pull out the key messages of the analysis. (A) Most voters, regardless of party, have mostly friends from the same party. and (B) Republicans are more likely to have more friends from the other party than Democrats. A third, but really not that interesting, point is that regardless of party, people have about the same likelihood to befriend Independents.

In visualization, less is more is frequently appropriate. So, here is a view of the same chart, using two categories instead of four.

Redo_junkcharts_friendsparties4b

The added advantage is only two required colors, and thus even grayscale can work.

The new arrangement of the pie platter makes it clear that there really isn't that much difference between Republican and Democratic voters along this dimension. Thus, visualizing the aggregate gets us to the same place.

Redo_junkcharts_friendsparties4c

After three servings of pies, the reader might be craving some energy bars

Redo_junkcharts_friendsparties4d

One can say that for very simple data like this, pie charts are acceptable. However, the stacked bar is better.

Thanks again Daniel, and it's a pleasure to serve you!


Details, details, details: giving Zillow a pie treatment

Delinquent_homes_chart
This chart (shown right), published by Zillow in a report on housing in 2012, looks quite standard, apparently avoiding the worst of Excel defaults.

In real estate, it’s all about location. In dataviz, it’s all about details.

What are some details that I caught my eye on this chart?

Readers have to get over the hurdle that “negative equity” is the same as “underwater homes.” This is not readily understood unless one reads the surrounding text. For example, the first row for the U.S. average proclaims that 31% of U.S. homes are “underwater” and among these underwater homes, 10% of the mortgages are delinquent. The former is concerned with the valuation of the property while the latter deals with payments or lack thereof.

According to the legend, the blue segments stand for the proportions of underwater homes in different metro areas but it’s not quite true – the blue part represents underwater but not delinquent mortgages while the red and blue combined represents all underwater mortgages. This is a common problem in stacked bar charts.

The metro areas are in alphabetical order by city, which means an opportunity is missed to help readers discern patterns. Patterns related to city-name alphabets is not of interest to most (except certain econometrics journal editors). Try arranging by region, or by decreasing level of negative equity, or some other meaningful variable.

The designer tried to do something clever with the horizontal axis labels and I don't think it succeeds. To see what is going on, read the note below the chart. The trick is to let readers look at the number of underwater and delinquent mortgages in two ways, as a proportion of underwater mortgages (through the white data labels) and as a proportion of all mortgages (through the axis labels). That's a mess, sorry to say.

Finally, I like the horizontal axis to extend to 100% because underlying the proportions shown in blue and on the horizontal axis is the population of all mortgages.

***

Perhaps a shock to many readers. The task of showing underwater delinquent mortgages simultaneously as a proportion of underwater mortgages and as a proportion of all mortgages is solved using .... pie charts.

I just created a couple of examples here:

Redo_zillowunderwater

The deep orange sector can be compared to the entire circle, or to the larger orange sector. Readers usually don't have a problem with pies with only three slices.


Wheel of fortune without prizes: the negative report about negativity

My friend, Louis V., handed me a report from Harvard's Shorenstein Center, with the promise that I can make a blog post or two from it. And I wasn't disappointed.

This report (link) caught some attention a few months ago because of the click-bait headline that the media is "biased" against Trump in his first 100 days. They used the most naive definition of "bias". The metric is the amount of coverage that is "negative," with the unspoken standard that the media should be 50% negative.

In the court of law, it is already established that, for example, a loan company cannot be sued for racial bias simply because the rejection rate of loans for black Americans is higher than that of white Americans. Similarly, a university is not necessarily biased against black applicants even if the proportion of black students is found to be below the national proportion of black Americans (in a statistically significant way).

The appropriate amount of negative coverage is a function of the content of the President's actions, which is a hard standard to nail down, much harder than the two examples shown above - which just says that such an analysis is futile from start to finish. Let alone the irony of generating a negative media headline criticizing the negative tone of media coverage.

***

Now let's turn to their use of visuals. The following pair of pie charts is used to show the differences in coverage between U.S. and European media.

Harv_mediatone

These pie charts are inspired by Wheel of Fortune, without the prizes. 

WOFSite2010

Notice how our attention is caught by certain colors (red, orange, etc.) and the size of the slices. The largest red slice is labeled "Other Foreign/Defense" in the U.S. pie chart, although it did not merit a mention in the accompanying writeup so it's not clear what that category means.

Instead of ordering the slices by their sizes, the design puts the larger slices as far apart as possible. Further, each color is used twice in a mirrored way, causing us to infer an association between categories that don't exist.

***

There are lots of conventional ways to display this data better. I decided to experiment with word clouds (using the Wordle tool).

Here is one in which the color indicates whether the coverage is American or European. Each word appears twice, and in proximity to one another for comparison.

Redo_harv_mediatone_1

One can directly compute a discrepancy metric between the two regions. This next chart shows the difference in importance accorded each topic by American versus European media:

Redo_harv_mediatone_2

 

 


Unintentional deception of area expansion #bigdata #piechart

Someone sent me this chart via Twitter, as an example of yet another terrible pie chart. (I couldn't find that tweet anymore but thank you to the reader for submitting this.)

Uk_itsurvey_left

At first glance, this looks like a pie chart with the radius as a second dimension. But that is the wrong interpretation.

In a pie chart, we typically encode the data in the angles of the pie sectors, or equivalently, the areas of the sectors. In this special case, the angle is invariant across the slices, and the data are encoded in the radius.

Since the data are found in the radii, let's deconstruct this chart by reducing each sector to its left-side edge.

This leads to a different interpretation of the chart: it’s actually a simple bar chart, manipulated.

Redo_ukitsurvey_1

The process of the manipulation runs against what data visualization should be. It takes the bar chart (bottom right) that is easy to read, introduces slants so it becomes harder to digest (top right), and finally absorbs a distortion to go from inefficient to incompetent (left).

What is this distortion I just mentioned? When readers look at the original chart, they are not focusing on the left-side edge of each sector but they are seeing the area of each sector. The ratio of areas is not the same as the ratio of lengths. Adding purple areas to the chart seems harmless but in fact, despite applying the same angles, the designer added disproportionately more area to the larger data points compared to the smaller ones.

  Redo_ukitsurvey_2

In order to remedy this situation, the designer has to take the square root of the lengths of the edges. But of course, the simple bar chart is more effective.

 



 


Making the world a richer place #onelesspie #PiDay

Xan Gregg and I have been at it for a number of years. To celebrate Pi Day today, I am ridding the world of one pie chart.

Here is a pie chart that is found on Wikipedia:

Wiki_20_Largest_economies_pie_chart.pdf

Here is the revised chart:

Redo_worldeconomypie

It's been designed to highlight certain points of interest.

I find the data quite educational. These are some other insights that are not clear from the revised chart:

  • Japan's economy is larger than Germany's
  • Russia's economy is smaller than that of Germany, Italy, India, Brazil, or South Korea
  • China and Japan combined have GDP (probably) larger than Western Europe
  • Turkey, Netherlands, Switzerland, South Africa are in the Top 20

PS. Xan re-worked a radar chart this year. (link)

 

 


First ask the right question: the data scientist edition

A reader didn't like this graphic in the Wall Street Journal:

Wsj_datascientist_timeofday

One could turn every panel into a bar chart but unfortunately, the situation does not improve much. Some charts just can't be fixed by altering the visual design.

The chart is frustrating to read: typically, colors are used to signify objects that should be compared. Focus on the brown wedges for a moment: Basic EDA 46%, Data cleaning 31%, Machine learning 27%, etc. Those are proportions of respondents who said they spent 1 to 3 hours a day on the respective tasks. That is one weird way of describing time use. The people who spent 1 to 3 hours a day on EDA do not necessarily overlap with those who spent 1 to 3 hours a day on data cleaning. In addition, there is no summation formula that lets us know how any individual, or the average data scientist, spends his or her time during a typical day.

***

But none of this is the graphics designer's fault.

The trouble with the chart is in the D corner of the Trifecta checkup. The survey question was poorly posed. The data came from a study by O'Reilly Media. They asked questions of this form:

How much time did you spend on basic exploratory data analysis on average?

A. Less than 1 hour a week
B. 1 to 4 hours a week
C. 1 to 3 hours a day
D. 4 or more hours a day

It is not obvious that those four levels are mutually exhaustive. In fact, they aren't. One hour a day for five working days is a total of 5 hours a week. Those who spent between 4 and 5 hours a week have nowhere to go.

Further, if one had access to individual responses, it's likely that many respondents either worked too many hours or too few hours.

The panels are separate questions which bear no relationship to each other, even though the tasks are clearly related by the fact that there are only so many working hours in a day.

To fix this chart, one must first fix the data. To fix the data, one must ask the right questions.