Many pundits make all kinds of interpretations of "exit poll" data. This prompted me to spend some time understanding the data collected in these exit polls. I looked at the data published by CNN although it doesn't matter because exit polling seems to be a monopoly run by Edison commissioned by a consortium of mainstream media outlets. The process is completely opaque - I can't even find a description of methodology on Edison's website! Much of what I learned came from older publications (due to various controversies of exit polls in the 2004, 2008 elections) and from Andrew Gelman or his readers.
I'll discuss the various known statistical problems with exit polls in a subsequent post. In this post, I'm assuming that the results from exit polls are trustworthy (they are not!) - it's the selection and interpretation of the data I'm focused on here.
***
One of biggest obsessions of the media is "electability". I already debunked electability here, and FiveThirtyEight also published an article that does the same thing.
In the aftermath of Super Tuesday, I heard various pundits say "According to the exit polls, of those voters who are most interested in beating Trump, Joe Biden is the top choice."
This interpretation can be traced back to the following exit poll question:
In California and Texas, around 60-70% exit-poll respondents said they like to beat Trump when forced to choose between that and voting on issues. Of those, about 30 to 40% voted for Biden, and 60 to 70% vote for someone else. This evidence is surprisingly weak.
Note, Biden won Texas with about 34.5% of the votes. Compare this to 38% who voted for him among those who said beating Trump is the top goal. That is not a big difference, in fact, it is probably not statistically significant (we can't know for sure as the pollster doesn't release enough data publicly.)
In California, Biden got 25% of the votes, and among those who saw beating Trump as more important than voting on issues, he got 28% of the votes. Again, not meaningful.
***
What troubles me is that the pundits are willfulling ignoring another question on the exit poll that gives a direct answer to electability.
Here is the question:
As a data scientist, when I see analysis that uses less direct data while ignoring direct data to make a point, it raises suspicion. In this case, I think the pundits are either being "story-first" (see here) or just lazy.
The problem is when they look at this table, Biden's number is 79% which is lower than Sanders's or Warren's so they looked elsewhere. The 79% means that of those who think Joe Biden has the best chance of beating Trump, 79 percent voted for Biden, and 21 percent didn't. For Warren and Sanders, the respective proportions are 80 and 84 percent.
Given small sample size (total sample is broken up further by candidate), none of the differences between those candidates is meaningful.
All this is saying is that 80 percent of people voted for the candidate they thought had the best chance of winning the election.
But a different tabulation of this data is more useful! Instead of having the columns sum up to 100 percent, have the rows sum up to 100 percent. Here is the same table with the new calculations:
(I had to zero out the candidates who dropped out or received too few respondents since the exit pollster suppressed the tiny percentages. The numbers were re-weighted after taking out their vote shares.)
This interpretation is more natural. We can say 90 percent of Joe Biden voters think he has the best chance of beating Trump, and 86-7% of Bloomberg/Sanders voters believe their candidate has the best chance of beating Trump.
This shows the electability question is silly. Tallying up who thinks who will beat Trump will yield similar numbers as tallying up who people voted for. The two statistics contain roughly the same information. Candidates who push electability are steering voters away from thinking about why they want to vote for someone.
The Warren line stands out in this table. Only half of Warren voters thought she has the best chance of beating Trump. So, she was a spoiler on Super Tuesday.
***
Before I close, I must comment on the poor design of the first question discussed above. It is an example of a "loaded" question. Voters are given a false binary choice: the Democrats should nominate someone who can beat Trump or agree with one on issues. The truth is voters should demand both, someone who beats Trump and executes policies that one agrees with.
Beating Trump is synonymous with winning the election, so the question is planting in the voters' minds the idea that choosing someone to agree with me on issues is not beating Trump, which means losing the election. This is the kind of question that fails a survey design exam.
***
tldr;
Media pundits are obsessed with the meaningless "electability" question. Strangely, in commenting on electability, they don't use the exit-poll question that directly addresses the topic ("Who has the best chance to defeat Trump?"), and instead, rely on an indirect analysis based on the question ("Should Democrats nominate someone who (a) agrees with you on the issues or (b) can beat Trump?")
The indirect analysis is flawed, an over-interpretation of immaterial small differences, and the underlying question is loaded, forcing respondents to equate agreeing on issues with not beating Trump (i.e. losing the general election).
The direct electability question proves the point that electability is a useless, circular concept, People vote for who they think will win, while who wins is a result of who people vote for. Thus, candidates pushing electability is nudging voters to nominate them on the belief that they will win, instead of charisma, policies, or anything else. The reality is no voter can predict who will beat Trump.
Whenever a data analysis avoids the obvious, direct data, and resorts to indirect secondary data, one should be suspicious. If the direct data are not even mentioned, there is a good chance that the data do not support the conclusion.
Comments