Pollsters, forecasters, and the likes were embarrassed by the Bernie Sanders upset in Michigan Tuesday night. Nate Silver called it among the greatest polling error in primary history. Now, they are struggling to explain the big miss. Recall the polls conducted close to the contest showed a Clinton lead of about 20 percent points. The actual outcome was a gap of 1.5 points, with a million votes cast.
This type of miss is hard to explain because any plausible explanation must deal with these facts:
- The miss of 20 percent points is a huge gap to explain
- The same methodology used in other states has generally been accurate
Take, for example, the claim that polls missed young voters because of a bias toward landlines. This explanation doesn't fly because pollsters use the same methodology in each state. If they are not polling enough young voters in Michigan, they most likely did not poll enough young voters in other states either. Besides, the math doesn't work.
Here's an excerpt from Huffington Post (link):
The pre-election polls also appear to have underestimated turnout among young voters, who overwhelmingly support Sanders. That same NBC/Wall Street Journal/Marist poll showed 18- to 29-year-olds making up 15 percent of the electorate, whereas the exit poll showed 20 percent of the electorate in that age group. In the pre-election poll, Sanders won the age group 74 percent to 25 percent; in the exit poll, his margin was bigger -- 81 percent to 18 percent.
But a back-of-the-envelope calculation shows that this could account for perhaps 7% of the gap only. Based on the data cited, the other 85% of the people in the pre-election poll had given Clinton a 32% edge in order for the average to be 20%. Assume now that 80% of the primary voters gave Clinton a 32% edge and 20% (i.e. the 18- to 29-years old group) gave Sanders a 63% edge. The result is a Clinton edge of 13%, still far from accurate.
***
Many of the other explanations offered by Huffington Post (link) or FiveThirtyEight (link) has a similar tone. They are plausible but the numbers don't quite work. Perhaps it is combination of factors - the perfect storm. Or perhaps voters did make up their minds or change their minds late in the game.
Also note that all these explanations assume the existence of statistical bias. This source of error is very different from statistical variability - the argument that the error made by the pre-election polls is a one-in-a -hundred year storm.
Maybe what we are seeing is a Trump effect, where he is encouraging more of certain types of voters to be involved in the process.
Posted by: Ken | 03/11/2016 at 01:01 AM
"Also note that all these explanations assume the existence of statistical bias. This source of error is very different from statistical variability - the argument that the error made by the pre-election polls is a one-in-a -hundred year storm."
Well said Kaiser! I was writing about it in your last post when I read this one.
5-38 explanation (http://fivethirtyeight.com/features/why-the-polls-missed-bernie-sanders-michigan-upset/) are of the type:
Pollsters underestimated this, Pollsters underestimated that, Pollsters underestimated another thing, Pollsters missed... and so on.
The reality is another: 5-38 overestimated pollsters!
Let's face it: combining several polls is a trivial, mechanical task (who says "meta-analysis"?). It's not the contribution I'm (and you, I suppose) expecting from 5-38. I'm expecting that 5-38 takes into account those elements _before_ eliciting the prior of their forecast, not after.
Furthermore, like you, I think that the cited comment "Hillary win 99% of the time" I read on your previous post is inherently wrong.
Here is the curious fact (and a strong potential source of misunderstanding): that other times the same pollsters were correct in estimating results. My claim is that they were _right_ for the _wrong_ reason. Since it is not the first time that a similar mistake happened, there are some evidences that polls are flawed. We should consider the possibility that wrong models can yield acceptable estimates the majority of times, not that right models can fail few times.
The difference between the two scenarios cannot be greater than intepreting one of these failures.
Posted by: Antonio | 03/11/2016 at 03:19 AM
Antonio: Food for thought: in statistics, we like to say there are no "right" models so I think we agree that "wrong models can yield acceptable estimates the majority of the time". I think the question you are raising can be phrased as: how does a 538-type model learn from this big miss? I don't think the structure of their current model allows such learning.
Posted by: Kaiser | 03/11/2016 at 05:01 PM
I called them "wrong" models to be brief.
By "wrong model" I mean one that does not take into account all the biases cited by 5-38 ("Pollsters underestimated...") and that gives the "right" estimates when biases are not present or compensate themselves (because of different directions).
Anyway,
Give me a lever long enough and a fulcrum on which to place it, and... I can move the world.
Give me a representative sample and... I will predict the right poll result (with the right probability).
The next step in my opinion is compulsory: to consider that polls are based on not representative samples, even if this yields not consistent estimators.
Posted by: Antonio | 03/12/2016 at 01:55 PM