« January 2018 | Main | March 2018 »

Saying no thanks to a box of donuts

As I reported last week, the Department of Education for Delaware is running a survey on dashboard design. The survey link is here.

One of the charts being evaluated is a box of donuts, as shown below:


I have written before about the problem with donut charts (see here). A box of donuts is worse than one donut. Here, each donut references a school year. The composition by race/ethnicity of the student body is depicted. In aggregate, the composition has not changed drastically although there are small changes from year to year.

In the following alternative, I use a side-by-side line charts, sometimes called slopegraphs, to illustrate the change by race/ethnicity.


The key decisions are:

  • using slopes to encode the year-to-year changes, as opposed to having readers compute those changes by measuring and dividing
  • using color to show insights (whether the race/ethnicity has expanded, contracted or remained stable across the three years) as opposed to definitions of the data
  • not showing that the percentages within each year summing to 100% as opposed to explicitly presenting this fact in a circular arrangement
  • placing annual data side by side on the same plot region as opposed to separating them in three charts


There is still a further question of how big a change from year to year is considered material.

This is a good example of why there is never "complete data." In theory, the numbers on this chart are "complete," and come from administrative records. Even when ignoring the possibility that some of the records are missing or incorrect, you still have the issue that the students in the system from year to year varies, so a 1 percent increase in the proportion of Hispanic students can indicate a real demographic trend, or it does not.



Why line charts are better than area charts

I saw this chart on Business Insider recently:


This links to Market Insider, where there is a tool to make different types of charts. Despite the huge drop depicted above, by last week, the Dow Jones index has recovered to the level at the start of 2018:


The same chart can be made as an area chart (called a "mountain chart" by Market Insider).


The painting of the area serves no purpose here because the area doesn't mean anything.

Imagine adding an inch of space to the bottom of each chart. The area chart is sensitive to the choice of the minimum value of the vertical axis while the line chart isn't. Since the data did not change, it's not a good idea for the display to shift perception. That's why I prefer the line chart.

Upcoming talks and workshops, NYC, Seattle, Philadelphia, St. Louis

The following talks/events are all free and open to the public. If I'm in your neighborhood, please come by and say hi.



You can register or learn more about the above talks at the following links:

Feb 20, 2018 (tonight, NYC) - Principal Analytics Prep Open House, with me and Tina Lowry, talking about the data analytics field and how to get a job in this space. More information here.

Feb 22, 2018 (Seattle) - A talk about best practices in data visualization for business presentations. Sign up here.

Feb 28, 2018 (NYC) - Analytics Resume Workshop, jointly hosted by Principal Analytics Prep and New York Public Library Job Search Central: we provide free advice on improving your resume to appeal to analytics and data science hiring managers. Register at our Meetup group here

March 7, 2018 (Philadephia) - A talk about best practices in data visualization for business presentations. Sign up here

This last event, part of the Midwest Digital Marketing Conference, has a small fee.

March 26, 2018 (St. Louis) - Workshop on data visualization: simple things you can do to make even Excel charts better! Sign up here (scroll to the bottom of the page).

Please help the Delaware Dept of Education


Shane C. asked me to fill out a survey hosted by the Delaware Department of Education. This is a survey about designing their dashboard. And I'm very happy to see that they are doing this. In the survey, you are asked to comment on different ways of presenting certain data, and they want to know which version is "easier to understand". It takes about 5-10 minutes to complete it.

The link to the survey is here, and some background information is here (although you don't really need it if you are just interested in the dataviz side).

I'd highly encourage you to leave text comments at the end if you think - for example - that there are even better ways to show the data.

Looking above the waist, dataviz style

I came across this chart on NYU's twitter feed. 


Growth has indeed been impressive; the dataviz less so. Here's the problem with not starting the vertical scale of a column chart at zero:


In a column chart, the heights of the columns should be proportional to the data. Here they are misaligned because an equal amount has been chopped off below 30,000 from all columns. The light purple that I layered on top of the chart presents the correct heights of the columns, assuming that the first column for 2007 indeed properly encoded the data.

The dark purple top of each column represents the "lie factor." It is the amount of exaggeration created by chopping off those legs. The lie factor is of Ed Tufte coinage.


The designer probably wanted to show the year-to-year trend more starkly. Doubling the number of applications in 10 years is pretty impressive. The solution is not to chop off the legs but to look above the waist. You can't fix the column chart but you can switch to a line chart, as follows:


In a line chart, we are mostly concerned with the changing slope of the line segments going from year to year. The slopes encode the year-on-year growth rates. 


Governor of Maine wants a raise

In a Trifecta checkup, this map scores low on the Q corner: what is its purpose? What have readers learned about the salaries of state governors after looking at the map? (Link to original)


The most obvious "insights" include:

  • There are more Republican governors than Democratic governors
  • Most Democratic governors are from the coastal states
  • There is exactly one Independent governor
  • Small states on the Eastern seaboard is messing up the design

Notice I haven't said anything about salaries. That's because the reader has to read the data labels to learn the governor's salary in each state. It's work to know what the average or median salary is, or even the maximum and minimum without spending quality time with the labels.

This is also an example of a chart that is invariant to the data. The chart would look exactly the same if I substituted the real salaries with 50 fake numbers.


The following design attempts to say something about the data. The dataset is actually not that interesting because the salaries are relatively closely clustered.

You get to see the full range of salaries, with the median, 25th and 75th percentiles marked off. The states are divided into top and bottom halves, with the median as the splitting level. A simple clustering algorithm is applied to group the salaries into similar categories, then color-coded.

The Maine governor is the least compensated.

If you have other ideas for this dataset, feel free to submit them to me.

Missing comments

It turns out that Typepad's spam comment software is pretty aggressive, and several legitimate comments have been sitting in the spam folder for quite a while - interestingly, some of my own comments are also predicted to be "spam"!

So you may find that your comment has finally appeared on the blog. Apologies for the delay!

When design goes awry

One can't accuse the following chart of lacking design. Strong is the evidence of departing from convention but the design decisions appear wayward. (The original link on Money here)



The donut chart (right) has nine sections. Eight of the sections (excepting A) have clearly all been bent out of shape. It turns out that section A does not have the right size either. The middle gray circle is not really in the middle, as seen below.


The bar charts (left) suffer from two ills. Firstly, the full width of the chart is at the 50 percent mark, so readers are forced to read the data labels to understand the data. Secondly, only the top two categories are shown, thus the size of the whole is lost. A stacked bar chart would serve better here.

Here is a bardot chart; the "dot" part of it makes it easier to see a Top 2 box analysis.


I explain the bardot chart here.


 PS. Here is Jamie's version (from the comment below):