Podcast highlights
Convincing charts showing containment measures work

I made a streamgraph

The folks at FiveThirtyEight were excited about the following dataviz they published last week two weeks ago, illustrating the progression of vote-counting by state. (link) That was indeed the unique and confusing feature of the 2020 Presidential election in the States. For those outside the U.S., what happened (by and large) was that many Americans, skewing Biden supporters, voted by mail before Election Day but their votes were sometimes counted after the same-day votes were tallied.



A number of us kept staring at these charts, hoping for a how-to-read-it explanation. Here is a zoom-in for the state of Michigan:


To save you the trouble, here is how.

The key is to fight your urge to look at the brown area. I know, it's pretty hard to ignore the biggest areas of every chart. But try to make them disappear.

Focus on the top edge of the chart. This line gives the total number of votes counted so far. In Michigan, by hour 12, about 2.4 million votes were counted, and by hour 72, 2.8 million votes were on the book. This line gives the sum of the two major parties' vote totals [since third parties got negligible votes in this election, I'm ignoring them so as to simplify the discussion].

Next, look at the red and blue areas. These represent the gap in the number of votes between the two parties' current vote totals. If the area is red, Trump was leading; if blue, Biden was leading. Each color flip represents a lead change. Suppress the urge to interpret red as the number or share of Trump votes.


What have we learned about the vote counting in Michigan?

Counting significantly slowed after the 12th hour. Trump raced to a lead on Election Day, and around hour 20, the race was dead even, and after that, Biden overtook Trump and never looked back. Throughout most of this period, the vote lead was small compared to the total votes cast although at the end, the Biden lead was noticeable.

If you insist on interpreting the brown area, it is equal to twice the vote total of the second-place candidate, so it really isn't something you want to look at.

Just for contrast, here is the chart for Iowa:


Trump led from beginning to end, with his lead widening slightly as more votes were counted.


As I was stewing over this chart, a ominous thought overcame me. Would a streamgraph work for this data? You don't hear much about streamgraphs here because I rarely favor them (see this long-ago post) but let's just try one and see.


(These streamgraphs were made in R using the streamgraph package. Post-processing was applied to customize the labeling.)

This chart conveys all the key points listed before. You can see how the gap evolved over time, the lead flips, which candidate was in the lead, and the total mass of votes counted at different times. The gap is shown in the middle.

I can't say I'm completely happy with the streamgraph - I hope readers don't care about the numbers because it's hard to evaluate a difference when it's split two ways on either side of the middle axis!


If you come up with a better idea, make sure to leave a comment.






Feed You can follow this conversation by subscribing to the comment feed for this post.


Instead of light gray on both sides, why not light blue on one side and pale pink on the other? That after all is why the areas are split in two.


Also, I wonder if instead of the heavy coloured part in the middle being symmetrical, perhaps they could have one straight edge, the top edge for one colour and the bottom edge for the other. This would break the mirror symmetry of the outer envelope, but that would not be a bad thing.

I hope I haven't failed to get the idea across: what I'm thinking of is that the centre would make a ruler-straight edge, defined either by the top edge of the heavy blue, OR the bottom edge of the heavy red, at any given moment of the graph (or vice versa, top edge of red or bottom edge of blue)

Kira M

The color scheme of your stream graph is a big improvement, since the grey draws less attention than that dark color. But the original graphs work better to show what’s going on. So how about keeping the original graphs but taking your color scheme?


I think you color scheme is far better. I don't think the stream graph aspect of this adds anything.
I think if you just cut it in half and used the top half like a normal area graph, with the lead series at the bottom amd the grey above it, it would be far easier to read than the original, without confounding it by splitting the data in both directions.


Have you considered looking at the current data that is available regarding the reports of voter fraud, large quantities of votes being processed in short times after the polls closed, data hacking, voting machine errors and manipulation in the 2020 presidential election?

Could a comparative analysis be done on all USA state representative and house of Congress races separated by the 2 main political parties to predict how the election would have turned out if there was no fraud found with the president ballots? This would need to assume that the current Senate and house races were reported accurately with also no fraud. The main assumption is that all voters voted only within their preferred political party and that if a voter voted Republican for the house and Senate races, it is assumed that they voted Republican for president.

Thanks for the work that you do. I enjoy reading your blog.


Errol: My take about election fraud detection is here. In short, it's not worth my time because a statistical anomaly is not "proof". Even if there is only 1 percent chance that something could happen, there is still 1 percent chance. Roughly speaking, what you're suggesting is like what the election forecasters (Nate Silver, etc.) did. You'd need to fill in the blanks, i.e. the large group of Independents, the people who did not vote before, etc. The forecasters use polling data to guess at those values - and we're not so sure about that now.

The comments to this entry are closed.