The spector of Covid-19 has pushed U.S. elections off the headlines. Nonetheless, political scientist Andrew Gelman has just offered his take on the state of election polling (link to PDF). This article is perfect for those of us who have a keen interest in but no deep knowledge about polling as Andrew covers how such polls are conducted. Andrew, by the way, was the brain behind the election forecasts by the Economist in the 2020 U.S. presidential elections so he's not merely lecturing from the pulpit.
If you've followed recent U.S. elections, polling inaccuracy has been one of the popular themes. Funny enough, before Trump's stunning victory over Hillary Clinton, the media was trumpeting the exact opposite, how election polls were so damn accurate. Those were the glory days of Nate Silver. Recently, the narrative has flip-flopped: how pollsters could have given Clinton 90-99.9% chance of winning, how polls over-estimated Biden's victory margin, etc.
The following are notes from reading Andrew's article.
***
p. 1
Pollsters are aware that "recent pre-election polling errors have been an underrepresentation of choices favored by lower-education white voters" but "adjusting for ethnicity and education... did not solve the problem... this past November".
This gets at something important: making "adjustments" is necessary but may not be sufficient to make the problem go away. This is a very common issues I see all over the place. One has an observational dataset on people. One then shoves a pile of demographic data (age, sex, income groups, etc.) into the statisticial model, now known as the adjusted model, and voila, heaven opens up and all our sorrows have gone away! If someone asks about biases in the observational dataset, one points to the adjustment factors, and moves on. An analogous situation is peer review in a published journal. Is the research finding credible? In some corners, if it is published in a peer-reviewed journal, there should be no further questions. Reality is much more complicated. Are the right variables in the adjustment? Are the variables structured in the right way? Are there confounding variables that haven't been measured?
p. 2
the moment of truth when poll-based forecasts are compared to election outcomes.
The election prediction problem is unusual in that there is an inescapable, quick, definitive evaluation of the pollster's performance. Election forecasting is the most accountable type of data science. Most data science problems do not have this property. Sometimes, only certain types of errors are visible. In Chapter 4 of Numbers Rule Your World (link), I discussed the anti-doping labs: their false negative errors are unlikely to be discovered, while false positive errors elicit immediate outcry from the harmed athletes.
The key challenges [of polling] are (a) attaining a representative sample of potential voters, and (b) predicting turnout.
The progression is from eligible voters to potential voters to actual voters. There isn't much uncertainty in eligible voters, nor actual voters once the election is over. But turnout is uncertain, which is the link between potential voters and actual ones. Eligible voters are not usually a representative sample from the underlying population.
We cannot expect to eliminate non-sampling errors, because conditions for new elections are always changing. Unlike sampling errors, non-sampling error cannot be reduced simply by conducting more and larger surveys.
The last part debunks a Big Data myth. Collecting more data can't solve all problems. It reduces sampling error but does not eliminate non-sampling errors. Another way to think of non-sampling errors is that the people in the sample aren't representative of those in the population. It is helpful to extend our progression: eligible voters -> potential voters -> surveyed voters -> survey respondents -> actual voters. There are two sources of inaccuracy here: some actual voters may not be surveyed, or if they received the survey, may not have responded.
p. 5
If we are estimating opinion in 4 age categories, 2 categories of sex, 4 ethnicity categories, 5 education levels, and 50 states, that’s 8000 cells. Even if you are analyzing a set of polls with a total of 20,000 respondents, this will still leave with many cells with 0, 1, or 2 respondents, hardly enough to get any sort of estimate.
Each "cell" is a demographic subgroup. Polls rarely have 20,000 respondents so this problem is clearly pervasive. Andrew's statistical approach pools data from "similar" cells so there are enough respondents to come up with an estimate. Such a model is, however, vulnerable to being wrong about "similarity". Nevertheless, any method will be vulnerable to one issue or another because fundamentally, the available data are incomplete. You can, for example, produce just 8 cells based on age and sex alone, and then in each cell, you have enough respondents; but this model misses the effect of all the other variables, and therefore is inaccurate in a different way.
The adjustments only adjust for the variables included in the model.
The beauty of these models is that they are not black boxes. We know exactly what factors are being adjusted for, and the functional form. But if the form is incorrect, the adjusted model will be inaccurate. This points back to the point I made above: be skeptical when someone claims to have solved all biases by throwing a bunch of standard demographic variables into a model - be doubly skeptical if they only include one factor at a time, or only "main effects" - be triply skeptical if the outcome is affected by non-demographic factors, such as behavior.
p. 8 He goes into the crux of the matter:
Why were the polls as imprecise as they were? Some possible explanations are differential nonresponse, differential turnout, changes in opinion, and insincere survey responses. These last two explanations, which can also be phrased as “last-minute swings” and “shy Trump voters,” are natural explanations, but I doubt they are a major part of the story in 2020.
By "shy Trump voters", Andrew specifically meant Trump voters who lied to pollsters and claimed they would vote Democratic. His view that this was a non-factor appears to be commonly held among the political science community. I wonder if there is another kind of shy voter who appears under the heading of "differential nonresponse". What if Trump voters were systematically less likely to respond to polls?
p. 9
Some polls also adjust for partisanship, using recorded party registration, stated party identification, or stated vote in the previous election. But even the surveys that made all these adjustments were off by about 2 percentage points, on average.
There is a philosophical question in there. The residual inaccuracy can be either evidence of inadequate adjustment or of partisan bias being unimportant.
p. 10
An error of 2 or 3 percentage points is a problem for predicting very close elections but otherwise is not so consequential.
This is an important observation. The closeness of recent presidential elections makes the error bar a problem.
p. 11
He turns this around and praises the unreasonable accuracy of polls. Think about it. Surveying only some thousands of people among a population of 300 milllion gets the estimates down to a margin of error of only a few percentage points.
p. 13
In retrospect I wish that we’d expressed our 2020 Economist forecast conditionally: instead of simply stating a forecast interval and probability of each candidate winning, we could’ve graphed this interval as a function of the national polling error, thus indicating the confidence we had in our forecast at different levels of survey accuracy and making this dependence clear.
Good idea. Not sure if the public will appreciate it.
well-funded campaigns and advocacy groups... can do more effective survey adjustment using the voter file, which has information including past turnout history on nearly 200 million Americans.
I wonder how much this costs. Sounds like it could be a fortune.
it would be good to see less focus on political campaigns and more on surveys of attitudes
I really like this suggestion. I don't understand the value of election forecasting (except for the campaigns themselves).
Comments
You can follow this conversation by subscribing to the comment feed for this post.