I made a streamgraph

The folks at FiveThirtyEight were excited about the following dataviz they published last week two weeks ago, illustrating the progression of vote-counting by state. (link) That was indeed the unique and confusing feature of the 2020 Presidential election in the States. For those outside the U.S., what happened (by and large) was that many Americans, skewing Biden supporters, voted by mail before Election Day but their votes were sometimes counted after the same-day votes were tallied.

 

538_votetalliesovertimemap

A number of us kept staring at these charts, hoping for a how-to-read-it explanation. Here is a zoom-in for the state of Michigan:

538leadchanges_michigan

To save you the trouble, here is how.

The key is to fight your urge to look at the brown area. I know, it's pretty hard to ignore the biggest areas of every chart. But try to make them disappear.

Focus on the top edge of the chart. This line gives the total number of votes counted so far. In Michigan, by hour 12, about 2.4 million votes were counted, and by hour 72, 2.8 million votes were on the book. This line gives the sum of the two major parties' vote totals [since third parties got negligible votes in this election, I'm ignoring them so as to simplify the discussion].

Next, look at the red and blue areas. These represent the gap in the number of votes between the two parties' current vote totals. If the area is red, Trump was leading; if blue, Biden was leading. Each color flip represents a lead change. Suppress the urge to interpret red as the number or share of Trump votes.

***

What have we learned about the vote counting in Michigan?

Counting significantly slowed after the 12th hour. Trump raced to a lead on Election Day, and around hour 20, the race was dead even, and after that, Biden overtook Trump and never looked back. Throughout most of this period, the vote lead was small compared to the total votes cast although at the end, the Biden lead was noticeable.

If you insist on interpreting the brown area, it is equal to twice the vote total of the second-place candidate, so it really isn't something you want to look at.

Just for contrast, here is the chart for Iowa:

538leadchanges_iowa

Trump led from beginning to end, with his lead widening slightly as more votes were counted.

***

As I was stewing over this chart, a ominous thought overcame me. Would a streamgraph work for this data? You don't hear much about streamgraphs here because I rarely favor them (see this long-ago post) but let's just try one and see.

Junkcharts_redo_538leadchange_mi_ia

(These streamgraphs were made in R using the streamgraph package. Post-processing was applied to customize the labeling.)

This chart conveys all the key points listed before. You can see how the gap evolved over time, the lead flips, which candidate was in the lead, and the total mass of votes counted at different times. The gap is shown in the middle.

I can't say I'm completely happy with the streamgraph - I hope readers don't care about the numbers because it's hard to evaluate a difference when it's split two ways on either side of the middle axis!

***

If you come up with a better idea, make sure to leave a comment.

 

 

 

 


Podcast highlights

Recently, I made a podcast for Ryan Ray, which you can access here. The link sends you to a 14-day free trial to his newsletter, which is where he publishes his podcasts.

Kaiserfung_warroommedia

Ryan contacted me after he read my book Numbers Rule Your World (link). I was happy to learn that he enjoyed the stories, and during the podcast, he gave an example of how he applied the statistical concepts to other situations.

During the podcast, you will hear:

  • I have a line in my course syllabus that reads "after you take this class, you will not be able to look at numbers (in the media) with a straight face ever again." That's a goal of mine. And it also applies to my books.

  • Why are most statisticians skeptics

  • Figuring out the statistical conclusions is the easy part while the hardest challenge is to find a way to communicate them to a non-technical audience. I went through many drafts before I landed on the precise language used in those stories.

  • Why "correlation is not causation" is not useful practical advice
  • You can't unsee something you've already seen, and this creates hindsight bias
  • The biggest bang for the buck when improving statistical models is improving data quality

  • Some models, such as polls and election forecasts, can be thought of as thermometers measuring the mood of the respondents at the time of polling.

***

To hear the podcast, visit Ryan Ray's website.


This holiday retailers hope it will snow dollars

According to the Conference Board, the pandemic will not deter U.S. consumers from emptying their wallets this holiday season. Here's a chart that shows their expectation (link):

COVID-19-Holiday-Spend-847

 

A few little things make this chart work:

The "More" category is placed on the left, as English-speaking countries tend to be read Left-to-Right, and it is also given the deepest green, drawing our attention.

Only the "More" segments have data labels. I'd have omitted the decimals. I suspect they are added because financial analysts may be multiplying these percentages to yield dollar amounts, in which case the extra precision helps.

The categories are ordered by the decreasing propensity of increased spending this year relative to last year. (The business community has an optimism bias.)

The choice of three shades of one color instead of three different colors keeps the chart clean.

***

The use of snowflakes surely infuriates a hardcore Tufte fan although I like that they add a festive note to the presentation. The large snowflake isn't randomly positioned but placed exactly where it causes the least interference with the bar chart.

 


Using comparison to enrich a visual story

Just found this beauty deep in my submission pile (from Howie H.):

Iwillvote_texas

What's great about this pie chart is the story it's trying to tell. Almost half of the electorate did not vote in Texas in the 2016 Presidential election. The designer successfully draws my attention to the white sector that makes the point.

There are a few problems.

Showing two decimals is too much precision.

The purple sector is not labeled.

The white area seems exaggerated. The four sectors do not appear to meet at the center of the circle. The distortion is not too much but it's schizophrenic: the pie slices are drawn with low precision while the data labels have high precision.

***

The following fixes those problems, and also adds a second chart to contrast the two ways of thinking:

Redo_junkcharts_iwillvotecomtexas