One of the fun things about election forecasting from a scientific standpoint is that on November 6, the right answer will be revealed, and we can compare the predictions of our models to the actual result. It’s not realistic to expect any model to get exactly the right answer – the world is just too noisy, and the data are too sparse and (sadly) too low quality. But we can still assess whether the errors in the model estimates were small enough to warrant confidence in that model, and make its application useful and worthwhile.
With that said, here are the criteria I’m going to be using to evaluate the performance of my model on Election Day.
- Do the estimates of state opinion trends make sense? Although we won’t ever know exactly what people were thinking during the campaign, the state trendlines should at least pass through the center of the data. This validation also includes checking that the residual variance in the polls matches theoretical expectations, which so far, it has.
- How large is the average difference between the state vote forecasts and the actual outcomes? And did this error decline in a gradual manner over the course of the campaign? In 2008, the average error fell from about 3% in July to 1.4% on Election Day. Anything in this neighborhood would be acceptable.
- What proportion of state winners were correctly predicted? Since what ultimately matters is which candidate receives a plurality in each state, we’d like this to be correct, even if the vote share forecast is a bit off. Obviously, states right at the margin (for example, North Carolina, Florida, and Colorado) are going to be harder to get right.
- Related to this, were the competitive states identified early and accurately? One of the aims of the model is to help us distinguish safe from swing states, to alert us where we should be directing most of our attention.
- Do 90% of state vote outcomes fall within the 90% posterior credible intervals of the state forecasts? This gets at the uncertainty in the model estimates. I use a 90% interval so that there’s room to detect underconfidence as well as overconfidence in the forecasts. In 2008, the model was a bit overconfident. For this year, I’ll be fine with 80% coverage; if it’s much lower than that, I’ll want to revisit some of the model assumptions.
- How accurate was the overall electoral vote forecast? And how quickly (if at all) did it narrow in on the actual result? Even if the state-level estimates are good, there might be an error in how the model aggregates those forecasts nationally.
- Was there an appropriate amount of uncertainty in the electoral vote forecasts? Since there is only one electoral vote outcome, this will involve calculating the percentage of the campaign during which the final electoral vote was within the model’s 95% posterior credible interval. Accepting the possibility of overconfidence in the state forecasts, this should not fall below 85%-90%.
- Finally, how sensitive were the forecasts to the choice of structural prior? Especially if the model is judged to have performed poorly, could a different prior specification have made the difference?
If you can think of any I’ve left off, please feel free to add them in the comments.