Getting simple charts right
Feb 06, 2023
Ian K. submitted this chart on Twitter:
The chart comes from a video embedded in this report (link) about Chicago cops leaving their jobs.
Let's start with the basics. This is an example of a simple line chart illustrating a time series of five observations. The vertical axis starts at 10,000 instead of 0. With this choice, the designer wants to focus on the point-to-point change in values, rather than its relation to the initial value.
Every graph has add-ons that assist cognition. On this chart, we have axis labels, gridlines and data labels. Every add-on increases reading time so we should be sparing.
First consider the gridlines. In the following chart, I conduct a self-sufficiency test by removing the data labels from the chart:
You can see that the last three values present no problems. The first two, especially the first value, are hard to read - because the top gridline is missing! The next chart restores the bounding gridline, so you can see the difference that one small detail can make:
Next, let's compare the following versions of the chart. The left one contains data labels without gridlines and axis labels. The right one has the gridlines and axis labels but no data labels.
The left chart prints the entire dataset onto the chart. The reader in essence is reading the raw data. That appears to be the intention of the chart designer as the data labels are in large size, placed inside shiny white boxes. The level of the boxes determines the reader's perception as those catch more of our attention than the dots that actually represent the data.
The right chart highlights the dots and the lines between them. The gridlines are way too thick and heavy so as to distract rather than abet. This chart presumes that the reader isn't that interested in the precise numbers as she is in the trend.
As Ian pointed out, one of the biggest problems with this chart is the appearance of even time intervals when all except one of the date values are January. This seemingly innocent detail destroys the chart. The line segments of the chart encodes the pre-post change in the staffing numbers. For most of the line segments, the metric is year-on-year change but the last two line segments on the right show something else: a 19-month change, followed by a 5-month change.
I did the following analysis to understand how big of a staffing problem CPD faces.
First I restored the January 2022 time value, while shifting the Aug 2022 value to its rightful place on the time axis. Next, I added the dashed brown line, which represents a linear extension of the trend seen between January 2020-2021, before the sudden dip. We don't know what the true January 2022 value is but the projected value based on past trend is around 12,200. By August, the projected value is around 11,923, about 300 above the actual value of 11,611. By January 2023, the projected value is almost exactly the same as the actual value.
This linear trending analysis is likely too simplistic but it offers a baseline to start thinking about what the story is. The long-term trend is still down but the apparent dip in 2022 may not be meaningful.