« How do you react to a probability forecast? | Main | Sitting still against the myth that sitting kills »


Feed You can follow this conversation by subscribing to the comment feed for this post.


Maybe what we are seeing is a Trump effect, where he is encouraging more of certain types of voters to be involved in the process.


"Also note that all these explanations assume the existence of statistical bias. This source of error is very different from statistical variability - the argument that the error made by the pre-election polls is a one-in-a -hundred year storm."

Well said Kaiser! I was writing about it in your last post when I read this one.
5-38 explanation (http://fivethirtyeight.com/features/why-the-polls-missed-bernie-sanders-michigan-upset/) are of the type:
Pollsters underestimated this, Pollsters underestimated that, Pollsters underestimated another thing, Pollsters missed... and so on.
The reality is another: 5-38 overestimated pollsters!
Let's face it: combining several polls is a trivial, mechanical task (who says "meta-analysis"?). It's not the contribution I'm (and you, I suppose) expecting from 5-38. I'm expecting that 5-38 takes into account those elements _before_ eliciting the prior of their forecast, not after.
Furthermore, like you, I think that the cited comment "Hillary win 99% of the time" I read on your previous post is inherently wrong.
Here is the curious fact (and a strong potential source of misunderstanding): that other times the same pollsters were correct in estimating results. My claim is that they were _right_ for the _wrong_ reason. Since it is not the first time that a similar mistake happened, there are some evidences that polls are flawed. We should consider the possibility that wrong models can yield acceptable estimates the majority of times, not that right models can fail few times.
The difference between the two scenarios cannot be greater than intepreting one of these failures.


Antonio: Food for thought: in statistics, we like to say there are no "right" models so I think we agree that "wrong models can yield acceptable estimates the majority of the time". I think the question you are raising can be phrased as: how does a 538-type model learn from this big miss? I don't think the structure of their current model allows such learning.


I called them "wrong" models to be brief.
By "wrong model" I mean one that does not take into account all the biases cited by 5-38 ("Pollsters underestimated...") and that gives the "right" estimates when biases are not present or compensate themselves (because of different directions).
Give me a lever long enough and a fulcrum on which to place it, and... I can move the world.
Give me a representative sample and... I will predict the right poll result (with the right probability).

The next step in my opinion is compulsory: to consider that polls are based on not representative samples, even if this yields not consistent estimators.

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep