Shane C. asked me to fill out a survey hosted by the Delaware Department of Education. This is a survey about designing their dashboard. And I'm very happy to see that they are doing this. In the survey, you are asked to comment on different ways of presenting certain data, and they want to know which version is "easier to understand". It takes about 5-10 minutes to complete it.
The link to the survey is here, and some background information is here (although you don't really need it if you are just interested in the dataviz side).
I'd highly encourage you to leave text comments at the end if you think - for example - that there are even better ways to show the data.
In a Trifecta checkup, this map scores low on the Q corner: what is its purpose? What have readers learned about the salaries of state governors after looking at the map? (Link to original)
The most obvious "insights" include:
There are more Republican governors than Democratic governors
Most Democratic governors are from the coastal states
There is exactly one Independent governor
Small states on the Eastern seaboard is messing up the design
Notice I haven't said anything about salaries. That's because the reader has to read the data labels to learn the governor's salary in each state. It's work to know what the average or median salary is, or even the maximum and minimum without spending quality time with the labels.
This is also an example of a chart that is invariant to the data. The chart would look exactly the same if I substituted the real salaries with 50 fake numbers.
The following design attempts to say something about the data. The dataset is actually not that interesting because the salaries are relatively closely clustered.
You get to see the full range of salaries, with the median, 25th and 75th percentiles marked off. The states are divided into top and bottom halves, with the median as the splitting level. A simple clustering algorithm is applied to group the salaries into similar categories, then color-coded.
The Maine governor is the least compensated.
If you have other ideas for this dataset, feel free to submit them to me.
If you are not sick of the Washington Post article about friends (not) letting friends join the other party, allow me to write yet another post on, gasp, that pie chart. And sorry to have kept reader Daniel L. waiting, as he pointed out, when submitting this chart to me, that he had tremendous difficulty understanding it:
This is not one pie but six pies on a platter. There are two sources of confusion: first, the repeated labels of Republicans and Democrats to refer to different groups of people; and second, the indecision between using two or four categories of "how many".
Let me begin by re-ordering and re-labeling the chart:
From this version, one can pull out the key messages of the analysis. (A) Most voters, regardless of party, have mostly friends from the same party. and (B) Republicans are more likely to have more friends from the other party than Democrats. A third, but really not that interesting, point is that regardless of party, people have about the same likelihood to befriend Independents.
In visualization, less is more is frequently appropriate. So, here is a view of the same chart, using two categories instead of four.
The added advantage is only two required colors, and thus even grayscale can work.
The new arrangement of the pie platter makes it clear that there really isn't that much difference between Republican and Democratic voters along this dimension. Thus, visualizing the aggregate gets us to the same place.
After three servings of pies, the reader might be craving some energy bars.
One can say that for very simple data like this, pie charts are acceptable. However, the stacked bar is better.
Thanks again Daniel, and it's a pleasure to serve you!
In the last post, I discussed one of the charts in the very nice Washington Postfeature, delving into polarizing American voters. See the post here. (Thanks again Daniel L.)
Today's post is inspired by the following chart (I am showing only the top of it - click here to see the entire chart):
The chart plots each state as a separate row, so like most such charts, it is tall. The data analysis behind the chart is fascinating and unusual, although I find the chart harder to grasp than expected. The analyst starts with precinct-level data, and determines which precincts were "lop-sided," defined as having a winning margin of over 50 percent for the winner (either Trump or Clinton). The analyst then sums the voters in those lop-sided precincts, and expresses this as a percent of all voters in the state.
For example, in Alabama, the long red bar indicates that about 48% of the state's voters live in lop-sided precincts that went for Trump. It's important to realize that not all such people voted for Trump - they happened to live in precincts that went heavily for Trump. Interestingly, about 12% of the states voters reside in precincts that went heavily for Clinton. Thus, overall, 60% of Alabama's voters live in lop-sided precincts.
This is more sophisticated than the usual analysis that shows up in journalism.
The bar chart may confuse readers for several reasons:
The horizontal axis is labeled "50-point plus margin for Trump/Clinton" and has values from 0% to 40-60% range. This description seemingly infers the values being plotted as winning margins. However, the sub-header tells readers that the data values are percentages of total voters in the state.
The shades of colors are not explained. I believe the dark shade indicates the winning party in each state, so Trump won Alabama and Clinton, California. The addition of this information allows the analysis to become multi-dimensional. It also reveals that the designer wants to address how lop-sided precincts affect the outcome of the election. However, adding shade in this manner effectively turns a two-color composition into a four-color composition, adding to the processing load.
The chart adopts what Howard Wainer calls the "Alabama first" ordering. This always messes up the designer's message because the alphabetical order typically does not yield a meaningful correlation.
The bars are facing out from the middle, which is the 0% line. This arrangement is most often used in a population pyramid, and used when the designer feels it important to let readers compare the magnitudes of two segments of a population. I do not feel that the Democrat versus Republican comparison within each state is crucial to this chart, given that most states were not competitive.
What is more interesting to me is the total proportion of voters who live in these lop-sided precincts. The designer agrees on this point, and employs bar stacking to make this point. This yields some amazing insights here: several Democratic strongholds such as Massachusetts surprisingly have few lop-sided precincts.
*** Here then is a remake of the chart according to my priorities. Click here for the full chart.
The emphasis is on the total proportion of voters in lop-sided precincts. The states are ordered by that metric from most lop-sided to least. This draws out an unexpected insight: most red states have a relatively high proportion of votesr in lop-sided precincts (~ 30 to 40%) while most blue states - except for the quartet of Maryland, New York, California and Illinois - do not exhibit such demographic concentration.
The gray/grey area offers a counterpoint, that most voters do not live in lop-sided districts.
P.S. I should add that this is one of those chart designs that frustrate standard - I mean, point-and-click - charting software because I am placing the longest bar segments on the left, regardless of color.
Long-time follower Daniel L. sent in a gem, by the Washington Post. This is a multi-part story about the polarization of American voters, nicely laid out, with superior analyses and some interesting graphics. Click here to see the entire article.
Today's post focuses on the first graphic. This one:
The key messages are written out on the 2017 charts: namely, 95% of Republicans are more conservative than the median Democrat, and 97% of Democrats are more libearl than the median Republicans.
This is a nice statistical way of laying out the polarization. There are a number of additional insights one can draw from the population distributions: for example, in the bottom row, the Democrats have been moving left consistently, and decisively in 2017. By contrast, Republicans moved decisively to the right from 2004 to 2017. I recall reading about polarization in past elections but it is really shocking to see the extreme in 2017.
A really astounding but hidden feature is that the median Democrat and the median Republican were not too far apart in 1994 and 2004 but the gap exploded in 2017.
I like to solve a few minor problems on this graphic. It's a bit confusing to have each chart display information on both Republican and Democratic distributions. The reader has to understand that in the top row, the red area represents Republican voters but the blue line shows the median Democrat.
Also, I want to surface two key insights: the huge divide that developed in 2017, and the exploding gap between the two medians.
Here is the revised graphic:
On the left side, each chart focuses on one party, and the trend over the three elections. The reader can cross charts to discover that the median voter in one party is more extreme than essentially all of the voters of the other party. This same conclusion can be drawn from the exploding gap between the median voters in either party, which is explicitly plotted in the lower right chart. The top right chart is a pretty visualization of how polarized the country was in the 2017 election.
Twitter follower @ashwink_s didn't see eye-to-eye with the following charts that appeared in an Indian publication.
There is the infamous racetrack chart:
In the racetrack chart, the designer has embedded data in the angles at the center of the concentric circles but the visual cues point to the arc lengths. If the same proportion of people voted Yes as voted No, the two arcs should look like this:
The length of the red arc is much larger than the length of the gray arc, even though they encode the same value. There is no reason to double over, just pull them back straight pronto!
*** Next, we have a busy chart:
We are starstruck.
All those stars are redundant as they just illustrate the rating numbers printed to their left. The story here is that the government received a 7.5 rating, with no one rating it below 4, and the majority giving a 7 or 8. (It's curious that no one at all rated the government below 4. In most rating polls that I've come across, primarily in the U.S., there are extreme views.)
After the makeover:
P.S. Thanks to Matt F. who noticed the switched bars in the original post, and messaged me. The chart has now been fixed.
Reader Berry B. sent in a tip quite some months ago that I just pulled out of my inbox. He really liked the Washington Post's visualization of the electoral college in the Presidential election. (link)
One of the strengths of this project is the analysis that went on behind the visualization. The authors point out that there are three variables at play: the population of each state, the votes casted by state, and the number of electoral votes by state. A side-by-side comparison of the two tile maps gives a perspective of the story:
The under/over representation of electoral votes is much less pronounced if we take into account the propensity to vote. With three metrics at play, there is quite a bit going on. On these maps, orange and blue are used to indicate the direction of difference. Then the shade of the color codes the degree of difference, which was classified into severe versus slight (but only for one direction). Finally, solid squares are used for the comparison with population, and square outlines are for comparison with votes cast.
Pick Florida (FL) for example. On the left side, we have a solid, dark orange square while on the right, we have a square outline in dark orange. From that, we are asked to match the dark orange with the dark orange and to contrast the solid versus the outline. It works to some extent but the required effort seems more than desirable.
I'd like to make it easier for readers to see the interplay between all three metrics.
In the following effort, I ditch the map aesthetic, and focus on three transformed measures: share of population, share of popular vote, and share of electoral vote. The share of popular vote is a re-interpretation of what Washington Post calls "votes cast".
The information is best presented by grouping states that behaved similarly. The two most interesting subgroups are the large states like Texas and California where the residents loudly complained that their voice was suppressed by the electoral vote allocation but in fact, the allocated electoral votes were not far from their share of the popular vote! By contrast, Floridians had a more legitimate reason to gripe since their share of the popular vote much exceeded their share of the electoral vote. This pattern also persisted throughout the battleground states.
The hardest part of this design is making the legend:
My friend, Louis V., handed me a report from Harvard's Shorenstein Center, with the promise that I can make a blog post or two from it. And I wasn't disappointed.
This report (link) caught some attention a few months ago because of the click-bait headline that the media is "biased" against Trump in his first 100 days. They used the most naive definition of "bias". The metric is the amount of coverage that is "negative," with the unspoken standard that the media should be 50% negative.
In the court of law, it is already established that, for example, a loan company cannot be sued for racial bias simply because the rejection rate of loans for black Americans is higher than that of white Americans. Similarly, a university is not necessarily biased against black applicants even if the proportion of black students is found to be below the national proportion of black Americans (in a statistically significant way).
The appropriate amount of negative coverage is a function of the content of the President's actions, which is a hard standard to nail down, much harder than the two examples shown above - which just says that such an analysis is futile from start to finish. Let alone the irony of generating a negative media headline criticizing the negative tone of media coverage.
Now let's turn to their use of visuals. The following pair of pie charts is used to show the differences in coverage between U.S. and European media.
These pie charts are inspired by Wheel of Fortune, without the prizes.
Notice how our attention is caught by certain colors (red, orange, etc.) and the size of the slices. The largest red slice is labeled "Other Foreign/Defense" in the U.S. pie chart, although it did not merit a mention in the accompanying writeup so it's not clear what that category means.
Instead of ordering the slices by their sizes, the design puts the larger slices as far apart as possible. Further, each color is used twice in a mirrored way, causing us to infer an association between categories that don't exist.
There are lots of conventional ways to display this data better. I decided to experiment with word clouds (using the Wordle tool).
Here is one in which the color indicates whether the coverage is American or European. Each word appears twice, and in proximity to one another for comparison.
One can directly compute a discrepancy metric between the two regions. This next chart shows the difference in importance accorded each topic by American versus European media: