Election visuals 2: informative and playful
Sep 15, 2020
In yesterday's post, I reviewed one section of 538's visualization of its election forecasting model, specifically, the post focuses on the probability plot visualization.
The visualization, technically called a pdf, is a mainstay of statistical graphics. While every one of 40,000 scenarios shows up on this chart, it doesn't offer a direct answer to our topline question. What is Nate's call at this point in time? Elsewhere in their post, we learn that the 538 model currently gives Biden a 75% chance of winning, thrice that of Trump's.
In graphical terms, the area to the right of the 270-line is three times the size of the left area (on the bottom chart). That's not apparent in the pdf representation. Addressing this, statisticians may convert the pdf into a cdf, which depicts the cumulative area as we sweep from the left to the right along the horizontal axis.
The cdf visualization rarely leaves the pages of a scientific journal because it's not easy for a novice to understand. Not least because the relevant probability is 1 minus the cumulative probability. The cdf for the bottom chart will show 25% at the 270-line while the chance of Biden winning is 1 - 25% = 75%.
The cdf presentation is also wasteful for the election scenario. No one cares about any threshold other than the 270 votes needed to win, but the standard cdf shows every possible threshold.
The second graphical concept in the 538 post (link) is an attempt to solve this problem.
If you drop all the dots to an imaginary horizontal baseline, the above dotplot looks like this:
There is a recent trend toward centering dots to produce symmetry. It's actually harder to perceive the differences in heights of the band.
The secret sauce is to put down 100 dots, with a 75-25 blue-red split that conveys the 75% chance of a Biden win. Imposing the pdf line from the other visualization, I find that the density of dots roughly mimics the probability of outcomes.
It's easier to estimate the blue vs red areas using those dots than the lines.
The dots are stuffed toys. Clicking on each dot reveals a map showing one of the 40,000 scenarios. It displays which candidate wins which state. For example, the most extreme example of a Trump win is:
Here is a scenario of a razor-tight election won by Trump:
This presentation has a weakness as well. It gives the impression that each of the dots is equally important because they are the same size. In reality, the importance of each dot is proportional to the height of the band. Since the band is generally wider near the middle, the dots near the middle are more likely scenarios than the dots shown on the two edges.
On balance, I like this visualization that is both informative and playful.
As before, what strikes me about the simulation result is the flatness of the probability surface. This feature is obscured when we summarize the result as 75% chance of a Biden victory.
Is it CDF or PDF? You use both.
Posted by: Cheryl Renee Thompson Smith | Sep 15, 2020 at 09:41 AM
CRTS: The smoothed line in the third section of the 538 post is a PDF. The dot plot fills out the area below that PDF. The 75% chance of winning is a number that can be read off a CDF, but none of the charts in the 538 post or my post is a CDF.
Posted by: Kaiser | Sep 15, 2020 at 11:57 AM