Let's not mix these polarized voters as the medians run away from one another

Long-time follower Daniel L. sent in a gem, by the Washington Post. This is a multi-part story about the polarization of American voters, nicely laid out, with superior analyses and some interesting graphics. Click here to see the entire article.

Today's post focuses on the first graphic. This one:

Wpost_friendsparties1

The key messages are written out on the 2017 charts: namely, 95% of Republicans are more conservative than the median Democrat, and 97% of Democrats are more libearl than the median Republicans.

This is a nice statistical way of laying out the polarization. There are a number of additional insights one can draw from the population distributions: for example, in the bottom row, the Democrats have been moving left consistently, and decisively in 2017. By contrast, Republicans moved decisively to the right from 2004 to 2017. I recall reading about polarization in past elections but it is really shocking to see the extreme in 2017.

A really astounding but hidden feature is that the median Democrat and the median Republican were not too far apart in 1994 and 2004 but the gap exploded in 2017.

***

I like to solve a few minor problems on this graphic. It's a bit confusing to have each chart display information on both Republican and Democratic distributions. The reader has to understand that in the top row, the red area represents Republican voters but the blue line shows the median Democrat.

Also, I want to surface two key insights: the huge divide that developed in 2017, and the exploding gap between the two medians.

Here is the revised graphic:

  Redo_wpost_friendsparties1

On the left side, each chart focuses on one party, and the trend over the three elections. The reader can cross charts to discover that the median voter in one party is more extreme than essentially all of the voters of the other party. This same conclusion can be drawn from the exploding gap between the median voters in either party, which is explicitly plotted in the lower right chart. The top right chart is a pretty visualization of how polarized the country was in the 2017 election.

 


Some like it packed, some like it piled, and some like it wrapped

In addition to Xan's "packed bars" (which I discussed here), there are some related efforts to improve upon the treemap. To recap, treemap is a design to show parts against the whole, and it works by packing rectangles into the bounding box. Frequently, this leads to odd-shaped rectangles, e.g. really thin and really tall ones, and it asks readers to estimate relative areas of differently-scaled boxes. We often make mistakes in this task.

The packed bar chart approaches this challenge by allowing only the width of the box to vary with the data. The height of every box is identical, so readers only have to compare lengths.

Via Twitter, Adil pointed me to this article by him and his collaborators that describes a few alternatives.

One of the options is the "wrapped bar chart" introduced by Stephen Few. Like Xan, he also restricts the variation to legnths of bars while keeping the heights fixed. But he goes further, and abandons packing completely. Instead of packing, Few wraps the bars. Start with a large bar chart with many categories filling up a tall plotting area. He then divides the bars into different blocks and place them side by side. Here is an example showing 50 states, ranked by total electoral votes:

Umd_few_wrapped_bars

You can see the white space because there is no packing. This version makes it easier to see the relative importance of the different blocks of states but it is tough to tell how much the first block of 13 states accounts for. The wrapped barchart is organized similar to a small multiples, except that the scale in each panel is allowed to vary.

Another option is the "piled bars." This option, presented by Yalçın, Elmqvist, and Bederson, brings packing back. But unlike the packed bars or the treemap, the outside envelope no longer represents the total amount. In the "piled bars" design, the top X categories act as the canvas, and the smaller categories are packed inside these bars rather than around them. Take a look at this example, which plots GDP growth of different countries:

Umd_piledbars

 The inset on the left column is instructive. The green (smallest) and red (medium) bars are packed inside the blue (largest) bars. In this example, it doesn't make sense to add up GDP growth rates, so it doesn't matter that the outer envelope does not equal the total. It would not work as well with the electoral vote data in the previous example.

I wonder whether a piled dot plot works better than a piled bar chart. This piled bar chart shares a problem with the stacked area chart, which is that other than the first piece, all the other pieces represent the differences between the respective data and the next lower category, rather than the value of the data point. Readers are led to compare the green, red and blue pieces but the corresponding values are not truly comparable, or of primary interest.

This problem goes away if the bars are represented by dots.

***

What strikes me as the most key paragraph in the Yalcin, et. al.'s article is the following:

To understand graphical perception performance, we studied three basic tasks:

1) How accurately can we estimate the difference between two data points?
2) How accurately can we estimate the rank of a data point among all the rest?
3) How accurately can we guess the distribution characteristic of the whole dataset?

As a chart designer, we have to prioritize these tasks. There is unlikely to be a single chart form that will prevail on all three tasks. So if the designer starts with the question that he or she wants to address, that leads to the key task that the visualization should enable, which leads to the chart form that facilitates that task the best.

 

 

 


Unintentional deception of area expansion #bigdata #piechart

Someone sent me this chart via Twitter, as an example of yet another terrible pie chart. (I couldn't find that tweet anymore but thank you to the reader for submitting this.)

Uk_itsurvey_left

At first glance, this looks like a pie chart with the radius as a second dimension. But that is the wrong interpretation.

In a pie chart, we typically encode the data in the angles of the pie sectors, or equivalently, the areas of the sectors. In this special case, the angle is invariant across the slices, and the data are encoded in the radius.

Since the data are found in the radii, let's deconstruct this chart by reducing each sector to its left-side edge.

This leads to a different interpretation of the chart: it’s actually a simple bar chart, manipulated.

Redo_ukitsurvey_1

The process of the manipulation runs against what data visualization should be. It takes the bar chart (bottom right) that is easy to read, introduces slants so it becomes harder to digest (top right), and finally absorbs a distortion to go from inefficient to incompetent (left).

What is this distortion I just mentioned? When readers look at the original chart, they are not focusing on the left-side edge of each sector but they are seeing the area of each sector. The ratio of areas is not the same as the ratio of lengths. Adding purple areas to the chart seems harmless but in fact, despite applying the same angles, the designer added disproportionately more area to the larger data points compared to the smaller ones.

  Redo_ukitsurvey_2

In order to remedy this situation, the designer has to take the square root of the lengths of the edges. But of course, the simple bar chart is more effective.

 



 


An example of focusing the chart on a message

Via Jimmy Atkinson on Twitter, I am alerted to this chart from the Wall Street Journal.

Wsj_fiscalconstraints

The title of the article is "Fiscal Constraints Await the Next President." The key message is that "the next president looks to inherit a particularly dismal set of fiscal circumstances." Josh Zumbrun, who tipped Jimmy about this chart on Twitter, said that it is worth spending time on.

I like the concept of the chart, which juxtaposes the economic condition that faced each president at inauguration, and how his performance measured against expectation, as represented by CBO predictions.

The top portion of the graphic did require significant time to digest:

Wsj_fiscalconstraints_top

A glance at the sidebar informs me that there are two scenarios being depicted, the CBO projections and the actual deficit-to-GDP ratios. Then I got confused on several fronts.

One can of course blame the reader (me) for mis-reading the chart but I think dataviz faces a "the reader is always right" situation -- although there can be multiple types of readers for a given graphic so maybe it should say "the readers are always right."

I kept lapsing into thinking that the bold lines (in red and blue) are actual values while the gray line/area represents the predictions. That's because in most financial charts, the actual numbers are in the foreground and the predictions act as background reference materials. But in this rendering, it's the opposite.

For a while, a battle was raging in my head. There are a few clues that the bold red/blue lines cannot represent actual values. For one thing, I don't recall Reagan as a surplus miracle worker. Also, some of the time periods overlap, and one assumes that the CBO issued one projection only at a given time. The Obama line also confused me as the headline led me to expect an ugly deficit but the blue line is rather shallow.

Then, I got even more confused by the units on the vertical axis. According to the sidebar, the metric is deficit-to-GDP ratio. The majority of the line live in the negative territory. Does the negative of the negative imply positive? Could the sharp upward turn of the Reagan line indicate massive deficit spending? Or maybe the axis should be relabelled surplus-to-GDP ratio?

***

As I proceeded to re-create this graphic, I noticed that some of the tick marks are misaligned. There are various inconsistencies related to the start of each projection, the duration of the projection, the matching between the boxes and the lines, etc. So the data in my version is just roughly accurate.

To me, this data provide a primary reference to how presidents perform on the surplus/deficit compared to expectations as established by the CBO projections.

Redo_wsj_deficitratios

I decided to only plot the actual surplus/deficit ratios for the duration of each president's tenure. The start of each projection line is the year in which the projection is made (as per the original). We can see the huge gap in every case. Either the CBO analysts are very bad at projections, or the presidents didn't do what they promised during the elections.

 

 

 


If Clinton and Trump go to dinner, do they sit face to face, or side by side?

One of my students tipped me to an August article in the Economist, published when last the media proclaimed Donald Trump's campaign in deep water. The headline said "Donald Trump's Media Advantage Falters."

Who would have known, judging from the chart that accompanies the article?

Economist_20160820_woc352_1

There is something very confusing about the red line, showing "Trump August 2015 = 1." The data are disaggregated by media channel, and yet the index is hitched to the total of all channels. It is also impossible to figure out how Clinton is doing relative to Trump in each channel.

Here is a small-multiples rendering that highlights the key comparisons:

Redo_economist_earnedmedia1b

Alternatively, one can plot the Clinton advantage versus Trump in each channel, like this:

Redo_economist_earnedmedia2b

One sees that Clinton has caught up in the last month (July 2016), primarily through more coverage by "online news."

Imagine Mr. Trump and Mrs. Clinton dining at a restaurant. Are they seated side by side (Economist) or face to face (junkcharts)?


The many-faced area chart is not usually your best choice

I found this chart about the exploding U.S. debt levels in ZeroHedge (link), sourced from Citibank.

Citi debt total

The top line story is pretty easy to see: total debt levels have almost reached the peak of the 1930s. (Ignore that dreadful labeling of the years on the horizontal axis.)

Now, the three colors supposedly carry further insights related to the components of the debt. The problem is it is very hard to figure out which component(s) are responsible for the debt explosion. The choice of the area chart adds to our trouble.

Here are two other area charts that display the same three data series.

Redo_debt_compo_1

Just look at the yellow patch. The left chart gives the wrong impression of steep growth, refuted by the right chart. For the three data series, there are six unique area charts that one can produce!

The following smoothed line chart gives an accurate picture of the relative changes in levels of the debt components:

  Redo_debt_indices_2
Government debt was the primary driver of the exploding debt both in the 1930s and in the present era. The other debt components also rose but not quite as much. All data series are converted into indices, with 1920 as the reference year.

A scatter plot with connecting lines sometimes produces a more visual portrayal of "home-coming" although in this case, I am not sure the advantage is not clear.

Redo_debt_compo_2

This chart requires more attentive reading. It does make the point that by 2015, the level of government debt has exceeded the previous peak (1950) while the other two debt components are fast reaching the prior peak (1934).

 

 


Finding meaning in Big Blue California

Via Twitter, Pat complained that this Bloomberg graphic is confusing:

Bloomberg_electriccars

The accompanying article is here. The gist of the report is that electric cars are much more popular on the West coast because the fuel efficiency of such cars goes down dramatically in colder climates. (Well, there are political reasons too, also discussed in the article.)

What makes this chart confusing?

Our eyes are drawn to big blue California, and the big number 25,295. The blue block raises three questions: first, how do we interpret that 25,295 number? How big is it? To what should we compare the number? Second, we notice a blending of labels--California is the only label of a state while all other labels are of regions. Third, the number under West is 31,783, even larger than 25,295 although it gets a smaller font size, a black-and-white treatment, and a seemingly small allocation of space.

It takes a little time to figure out the structure of the graphic. That the baseline is a treemap with the regions, and big blue California is a highlight that sits within the West region.

Tufte would not love the "moivremoire"  patterns, nor do I. I'd have left the background of the entire right side plain white.

I fail to see why this treemap form is preferred to a simple bar chart.

***

As I play around with the data, basically playing with stacking the data, I found a way to make a more engaging graphic. This new graphic builds off an insight from this data: that the number of electric cars sold in California is more than all other states combined. So here you go:

Redo_bloomberg_electriccars

Since the article attributes the gap in sales to regional temperature, an even better illustration should bring in temperature data.

 


Losing count of money bags

I found this chart on a Munich publication called Süddeutsche Zeitung. This appeared during the most recent Greek/Euro crisis.

IMG_3330_germanyathens_smsm

The bags of money were financial obligations that were coming due from June 2015 to December 2015. There were three creditors, indicated by red, blue and gray.

This graphic answers one question well: individual debt obligations for a given month and given creditor. However, by privileging these details, the chart fails to convey cumulative totals well - readers have to make calculations in their heads.

In the revision, I wanted to convey two key messages: the total amount of debt that was coming due in those seven months, and the relative proportion of debt owed to the three creditors. An area chart brings this out better.

Redo_athens_debt

Conversely, it is much harder to figure out individual debt obligations by month and creditor from this version.

This points to the importance of determining your key message(s) before choosing a form.

 

 


Hello to St. Louis readers

Stlouismo

I'll be hosting a Data Visualization workshop at the Digital Media Marketing Conference in St. Louis, Missouri on Thursday. Here is the link to their website.

The workshop is arranged from three themes: Appreciating, Conceptualizing, and Improving. There will be several hands-on exercises.

If you are a reader in St. Louis, and would like to meet up, email me.

***

Posting this week will be light because of various commitment. I may put something up later this week.

One of my students pointed me to this Medium article about a NYT chart. Well worth reading.

 


Observing Rosling’s Current Visual Style

On the sister blog, I wrote about Hans Rosling’s recent presentation in New York (link). I noted that Rosling has apparently simplified his visual palette.

Rosling is best known as the developer of the Gapminder tool, used to visualize global social statistics data collected by national statistical agencies. I wrote favorably about this tool in a series of posts (link). Gapminder made popular the moving bubble chart, although not the only graphical form present.

Gapminder_screengrab

These animated bubble charts also made Rosling a YouTube star (See here.)

***

In last week’s presentation, Rosling only showed one moving bubble chart. The rest of his graphics are noticeably simpler, something that anyone can produce on Excel or Powerpoint. Here is one example:

Image1
 

I’m particularly impressed by a simple sequence of charts in which Rosling explains the demographic changes the world is expecting to see in the next 50 to 100 years.

  Image2

This is an enhanced area chart. Each slice of area is subdivided into stick figures so that an axis for population counts becomes unnecessary.

Instead, the reader sees two useful dimensions: region of the world, and age group.

How the population ages as it grows is the feature story and the effect of aging is ingeniously portrayed as layers. This becomes apparent as Rosling lets time roll forward, and the layers literally walk off the page. (Unfortunately, I couldn't capture each step fast enough.)

Image3

 (This photo courtesy of Daniel Vadnais.)

When Rosling showed the 2085 projection, we find that the entire rectangle has filled up, so the world population has definitely grown, roughly by 30 percent. The growth happens by filling up of adults; the total number of children has not changed. This is one of the key insights from recent demographic data. The first photo above shows something remarkable: the fertility rate in Asian countries has plunged to about the same level of developed countries already.

***

This set of charts is unusually effective. It represents another level of simplification in visual means. At the same time, the message is sharpened.

As I reported the other day (link), Rosling does not believe modern tools have improved data analysis. This talk which utilized simple tools is a good demonstration of his point.