You can follow this conversation by subscribing to the comment feed for this post.

I don't think the average person realises how complex the problem is. You have sampling that probably isn't good, then lots of missing data as people refuse to answer or answer don't know and then people may make incorrect statements (possibly lie) about who they will vote for or even if they will vote. In Australia we had a situation where a new polling company used automated polling and had polls about 2% higher for the right-wing party. Having been polled by both a traditional friendly woman and an automated system that year it was fairly obvious what was happening. People don't like being asked to push 1 if they were voting for X etc and it is very easy for them to hang up. The probability differed across voting intentions introducing a bias that needed to be corrected.

Some of the party surveys are probably better because they ask about previous voting, which would allow them to allocate the don't knows better. Trying to deal with 12% missing data when you have no other information about the subjects is more akin to witchcraft than statistics, and makes prediction impossible to 1or 2% accuracy.

1. The forecasting business relies on polls; most forecasters aggregate and don't do actual polling so they can't get at the errors built into those polls except by extrapolating from history (plus adjustments that may or may not be justified as a fitting tool).

2. It seems that getting n right, meaning the number of votes cast and votes cast by each group in the electorate, is very hard to do properly. I sympathize: if the numbers are correct and the black vote dropped by 1% of the total electorate, that's 1.3+ million votes, but how do you guess that number given history, given that Obama is black and Hillary is white, given that Bill Clinton was President and passed tough sentencing laws, etc. I'd say the results thus were within the error range of estimating n but that is exactly the kind of thing that gets sucked up into the polls so it's invisible. To be blunt, if people looked at a wide range for n, then their forecast errors would look too big to be acceptable.

Jonathan/Ken: Good points. I don't think the modelers were horrible - even if they are, it is too early to tell. But the readers were definitely misled and believed the hype.

"Having election forecasts does not advance our democracy." While I lack data to validate it, I've certainly heard more than one talking head propose that the presence of this information might actually change how voters act. That is, if you think your candidate has a really solid lead and you're somewhat less motivated to get to the polls, perhaps you don't show up. Thus polling and the dissemination of the information could have an impact on the outcome of the election. One could argue whether that "advances" our democracy or not, but if you're motivated to inaction (or action) by poll results, it certainly could.

The comments to this entry are closed.

##### Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

## Search3

•  only in Big Data
Amazon - Barnes&Noble

Numbersense:
Amazon - Barnes&Noble

## Junk Charts Blog

Graphics design by Amanda Lee

## Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

## Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here