The backlash on the election forecasting industry is coming and it will be fierce. If you read my previous post from 2016 (here), you know I have doubts about the social value of such work so this is a strange post for me to write. But I suspect this backlash will be overdone so I want to bring up an important issue for you to think about before you cancel the pollsters or modelers.
Were the models wrong? Our visceral reaction is yes, a million times yes. But don't jump to this quick conclusion. Let's first dissect the problem. I bet we can't even agree on what "wrong" means.
***
On the simplest level, wrong can mean the predicted outcome does not match the actual outcome. Think of the model as making 51 A or B choices (ignoring the two states that split their electoral votes), and from the actual outcome, we can compute an error rate. That type of analysis was Nate Silver's claim to fame. He got "all 50 states right" a few cycles ago. (Here's Mashable's article from 2012.)
That metric is silly. As we all know, only maybe 10 states have truly competitive races. In the other 40 states, everyone who knows anything about US politics can predict the outcomes perfectly. By including those non-competitive races in the error count, we allow forecasters to pad their statistics, and we effectively rendered the majority of the scoring scale meaningless.
Let me do a back-of-the-envelope analysis. Every forecaster starts with a 40/50 = 80% baseline accuarcy rate. Even though theoretically the scale runs from 0/50 to 50/50 i.e. 0 to 100%, the "effective" scale is from 80% to 100%. This situation should remind you of grade inflation at our universities. They claim a nominal scale of A, B, C, D, F but in reality, the scale is basically A and a little bit of B (and maybe a sprinkle of C, D and F). The GPA scale is nominally 0 to 4 but if almost everyone's GPA is between say 3 and 4, then the effective scale is from 3 to 4. Back to election forecasting, an accuracy rate of 80% indicates no skill whatsoever. Ninety-percent sounds great but it is actually crappy - that means getting only 5 out of the 10 toss-up states correct. In a properly calibrated scale, I'd call that 50% accuracy not 90%.
[Data scientists: this is not merely politics-talk. Think about how you measure accuracy of your predictive models. Think about comparing your model to random versus comparing it to a next-best model.]
***
That isn't the only problem with the accuracy metric. You have heard that the inputs to the model - the polling data - are flawed. Garbage in, garbage out. Maybe the pollsters are the culprits here.
Not so quick. Let me paint a different possibility. The pollsters are excellent; they acquire a representative sample of the voters, and the voters tell it like it is. Notwithstanding, it turns out that the actual result is far from the poll-predicted result.
This can happen without any incompetence or malfeasance. The voters can change thier mind. What they intend to do does not have to match what they actually do on election day.
You can ask me what I intend to do before I step into the polling booth, I can tell you what I have in mind, but I can still change my mind when I vote. In this case, the poll-predicted result ought to differ from the actual result!
No poll or model will be able to capture this flip. Even if you have implanted a chip in my brain, and managed to learn about the flip milliseconds before I do it, it would be too late for anyone to act on this piece of information.
What I'm saying is if the poll does not reflect the actual result, it doesn't always mean the pollster has gotten it wrong. Maybe you've caught what I'm about to say...
If people don't have clear intentions, there is no such thing as the correct answer. The model merely tells us what the mood is at the moment of measurement. Since the "mood" is not a real thing, it's an invention, the concept of accuracy is unattainable.
When a company like Spotify claims it can predict my mood, they face the same problem. They can guess what my mood is and act accordingly but there is no easy way to validate such a prediction. My brain does not have a built-in tracker of my "mood" that can act as the "ground truth". So, it's pointless to discuss the accuracy of a mood predictor.
[In recent years, these types of problems have invaded the data science space. They are not governed by the scientific method. This doesn't mean they are not worth solving. We just can't measure success like we do with traditional statistical or scientific models.]
***
Here's how I live with polls:
Think of polls as measuring the mood of the country (or state) at the time of data collection. Think of the mood as reflecting what the poll respondents are telling the pollsters at the time. Some people may be lying, some may be confused, some may be indecisive, it doesn't matter. Mood is like the weather.
The actual election outcome may differ from the poll predictions. The gap may just represent a shift in mood. It probably also contains some polling errors - I'm not saying polling is perfect; I'm just saying polls may not be as bad as critics allege.
Comments
You can follow this conversation by subscribing to the comment feed for this post.