Thanks to reader Charles Chris P., I was able to get the police staffing data to play around with. Recall from the previous post that the *Washington Post* made the following scatter plot, comparing the proportion of whites among police officers relative to the proportion of whites among all residents, by city.

In the last post, I suggested making a histogram. As you see below, the histogram was not helpful.

The histogram does point out one feature of the data. Despite the appearance of dots scattered about, the slopes (equivalently, angles at the origin) do not vary widely.

This feature causes problems with interpreting the scatter plot. The difficulty arises from the need to estimate dot density everywhere. This difficulty, sad to say, is introduced by the designer. It arises from using overly granular data. In this case, the proportions are recorded to one decimal place. This means that a city with 10% is shown separate from one with 10.1%. The effect is jittering the dots, which muddies up densities.

One way to solve this problem is to use a density chart (heatmap).

You no longer have every city plotted but you have a better view of the landscape. You learn that most of the action occurs on the top row, especially on the top right. It turns out there are lots of cities (22% of the dataset!) with 100% white police forces.

This group of mostly small cities is obscuring the rest of the data. Notice that the yellow cells contain very little data, fewer than 10 cities each.

For the question the reporter is addressing, the subgroup of cities with 100% white police forces is trivially important. Most of these places have at least 60% white residents, frequently much higher. But if every police officer is white, then the racial balance will almost surely be "off". I now remove this subgroup from the heatmap:

Immediately, you are able to see much more. In particular, you see a ridge in the expected direction. The higher the proportion of white residents, the higher the proportion of white officers.

But this view is also too granular. The yellow cells now have only one or two cities. So I collapse the cells.

More of the data lie above the bottom-left-top-right diagonal, indicating that in the U.S., the police force is skewed white on average. When comparing cities, we can take this national bias out. The following view does this.

The point indicated by the circle is the average city indicated by relative proportions of zero and zero. Notice that now, the densest regions are clustered around the 45-degree dotted diagonal.

To conclude, the Washington Post data appear to show these insights:

- There is a national bias of whites being more likely to be in the police force
- In about one-fifth of the cities, the entire police force is reported to be white. (The following points exclude these cities.)
- Most cities confirm to the national bias, within an acceptable margin of error
- There are a small number of cities worth investigating further: those that are far away from the 45-degree line through the average city in the final chart shown above.

Showing all the data is not necessarily a good solution. Indeed, it is frequently a suboptimal design choice.

The Washington Post has a good idea. Using Census data, they computed the proportion of police force who are white and the corresponding proportion of citizens who are white, in different cities.

In the following scatter plot, they singled out North Charleston, SC where the police force is 85% white but the citizens are only 40% white: (Link to the interactive chart.)

This plot itself is well done, with helpful coloring and labels.

One must be careful about "story time": it's easy to infer from the graph that blue dots mean worse racial tension but that interpretation requires an assumption not proven in the data. (What is missing is the correlation between this data and some other data measuring tension.)

The secret to reading this chart is to look at the slopes of lines from the origin to each point. Above the 45-degree diagonal separating the blue dots from the gray are the cities where the police is more white than the people. The steeper the line to the origin, the more unrepresentative. Once you pass the 45-degree line, do the reverse.

The slope is really the metric of X police per Y residents. So the two dimensions can be collapsed into one. With the one dimension, I'd try a histogram view. If you find the data, let me know. Or just post it to the comments.

I'll be hosting a Data Visualization workshop at the Digital Media Marketing Conference in St. Louis, Missouri on Thursday. Here is the link to their website.

The workshop is arranged from three themes: Appreciating, Conceptualizing, and Improving. There will be several hands-on exercises.

If you are a reader in St. Louis, and would like to meet up, email me.

***

Posting this week will be light because of various commitment. I may put something up later this week.

One of my students pointed me to this Medium article about a NYT chart. Well worth reading.

The following Wall Street Journal caught my eye the other day: (Link to article)

Looking closely, I realize that the four charts are identical, except for the call-outs. This is a kind of small-multiples in which the same data reside in each panel but the labeling changes. It's planned redundancy but I'm afraid I don't see the point.

The chart compares four different ways to save money by cutting cable. Here is an alternative that places the focus on the number of dollars saved: