Today's post is a continuation of my previous post about the two election forecasting models, by FiveThirtyEight and the Economist.
Recall Adam Pearce's visualization of the correlation matrices from both models:
The left matrix allows us to evaluate the FiveThirtyEight model on one particular metric, how it projects the correlations of vote shares between states. Intuitively, we think that certain state results should be highly correlated while other pairs might not be.
The most interesting - and controversial - aspect of that heatmap is the presence of pink, which corresponds to negative correlations. The FiveThirtyEight model suggests that in certain state-pairs, when Trump does better than expected in one state, he does worse than expected in the other state. Andrew Gelman thinks the opposite: if Trump does better than expected in a blue state, Trump does better than expected also in red states. And thus, the Economist model has a different heat map of correlations than FiveThirtyEight's - no pink in sight.
I find the heat map too rich for its own good. So, I made some boxplots. (R code at the end)
I grouped the correlations by state. Each state has 50 correlations with other states (D.C. is treated as a "state" for the purpose of election modeling).
Here is a simple boxplot representation of the correlations of Biden shares between Washington, DC and the 50 states, according to the FiveThirtyEight's model:
To map this back to Adam's heatmaps, look at the plot on the left, pick out the column labeled DC (two-thirds of the way across), take that whole column, remove the one dark green cell (which is DC correlated with itself), and make a boxplot out of those 50 correlations.
As a reminder, the box in the boxplot contains the middle 50 percent of the data, or the middle 25 correlations. The dots on either side are considered outliers - there are none. For DC, the FiveThirtyEight model assigns negative correlation with 20 other states (to the left of the brown dashed line).
***
Here then are the sets of boxplots for both models for all 51 states. The states are ordered by increasing average correlations in the Economist model so that it's easier to compare the two sides. (Notice that I placed the Economist model on the left, the reverse of Adam's order. Also, I plotted Biden vote shares instead of Trump vote shares.)
This graph reveals some major differences in how the two models were constructed.
(a) The spread of the boxes and particularly the "whiskers" are much wider in the FiveThirtyEight model than in the Economist model. The boxes (middle 25 correlations) have an average range of about 0.5 vs 0.3; that's a big difference. The FiveThirtyEight model incorporates a lot more variability of this type.
(b) The widths of the boxes are more variable in the Economist model than the FiveThirtyEight model. For the Economist, the boxes are narrow to wide; for FiveThirtyEight, it runs from wide to wider. In the Economist model, the top few boxes have very tight clustering.
(c) There are no negative correlations in the Economist model at all, nothing to the left of the brown dashed line. By contrast, the FiveThirtyEight model has negative correlations for most states.
(d) The biggest differences occur in the top six boxplots (DC, HI, MD, NM, GA, TX). In all these states, the Economist model expresses much lower uncertainty about correlations than the FiveThirtyEight model, and also the Economist model assigns these states the lowest average correlations.
(e) The midpoint of each box, i.e. the median correlation, is roughly sequenced the same although there are rather drastic differences. For instance, Texas (TX) is correlated over 0.5 with half the states in the FiveThirtyEight model while any correlation over 0.3 in the Economist model is an outlier.
All these observations relate to how the two models deal with the variability between states. As you can see, there are many moving parts.
***
Let's turn our attention back to DC (the top row). Here are the two relevant boxplots:
The Economist's model expresses the view that DC's going to vote for the Democratic candidate regardless of what happens in other states, thus low correlations tightly cluster around 0.15. One can fit almost the entire Economist's boxplot (excluding MD) into the right half of the FiveThirtyEight's box! The FiveThirtyEight model has a box that ranges from -0.1 to 0.3, and 20 states are negatively correlated with DC. That means, the more Biden wins in DC, the more he loses in those states.
Looking at the individual simulations (from Adam's tool):
Neither model calls DC for Trump under any scenario. FiveThirtyEight thinks that the more Trump overperforms in DC, the lower his vote share in Alabama (one of the states with negative correlation with DC). The Economist model basically delinks the two, the cloud of dots are almost vertical. [Note: The regression lines shown on Adam's plots may not be correct although it's hard to tell where the dots are where it is very dense.]
***
These boxplots allow us to analyze the heatmap, and describe the differences between the two models more effectively. I have no idea how much variability there should be between states
P.S. Here is R code that generates the side-by-side boxplots.
# starting point is bcorrlist: a list of the two correlation matrices
# stored in data frames
# stack, and remove self-correlations
bcorrstackedlist = lapply(bcorrlist, stack)
bcorrstackedlist = lapply(bcorrstackedlist, function(x) x[x$values != 1, ])
# sort order choices
stateorderlist = lapply(bcorrlist, function(x) round(sort(apply(x,2,mean), decreasing=TRUE),2))
# set parameters for boxplot
myboxplot = function(data, gorder) {
boxplot(data[[1]]$values ~ factor(data[[1]]$ind, levels = gorder), horizontal=TRUE,
ylim=c(-0.5,1), col="gray80", axes=FALSE, cex.title = 0.6,
xlab="", ylab="", main = paste(capitalize(names(data)), "Biden-Share Correlations by State"),
whisklty=7, whisklwd=0.25, outcex=0.5, outpch=16, boxtwd=0.1, medcex=0.6,
whiskcol="gray80", staplecol="gray80", boxcol="gray80", outcol="gray50", medcol="gray30")
axis(1, at=seq(-0.5, 1, by=0.2), col="gray", col.ticks="gray")
axis(2, at=1:51, labels=gorder, cex.axis=0.6, col="gray", col.ticks="gray")
abline(v=0, col="brown", lty=2)
}
# make charts
par(mar=c(3,3,1,1), mgp=c(1.5,.3,0), las=1, cex.axis=0.6, tcl=-0.2)
# Economist
myboxplot(bcorrstackedlist[1], names(stateorderlist[[1]]))
# FiveThirtyEight, order by FiveThirtyEight
myboxplot(bcorrstackedlist[2], names(stateorderlist[[2]]))
# FiveThirtyEight, order by Economist
myboxplot(bcorrstackedlist[2], names(stateorderlist[[1]]))
Comments
You can follow this conversation by subscribing to the comment feed for this post.