Model Checking

October 10, 2012 / Drew / 5 Comments

One of the fun things about election forecasting from a scientific standpoint is that on November 6, the right answer will be revealed, and we can compare the predictions of our models to the actual result. It’s not realistic to expect any model to get exactly the right answer – the world is just too noisy, and the data are too sparse and (sadly) too low quality. But we can still assess whether the errors in the model estimates were small enough to warrant confidence in that model, and make its application useful and worthwhile.

With that said, here are the criteria I’m going to be using to evaluate the performance of my model on Election Day.

Do the estimates of state opinion trends make sense? Although we won’t ever know exactly what people were thinking during the campaign, the state trendlines should at least pass through the center of the data. This validation also includes checking that the residual variance in the polls matches theoretical expectations, which so far, it has.

How large is the average difference between the state vote forecasts and the actual outcomes? And did this error decline in a gradual manner over the course of the campaign? In 2008, the average error fell from about 3% in July to 1.4% on Election Day. Anything in this neighborhood would be acceptable.

What proportion of state winners were correctly predicted? Since what ultimately matters is which candidate receives a plurality in each state, we’d like this to be correct, even if the vote share forecast is a bit off. Obviously, states right at the margin (for example, North Carolina, Florida, and Colorado) are going to be harder to get right.

Related to this, were the competitive states identified early and accurately? One of the aims of the model is to help us distinguish safe from swing states, to alert us where we should be directing most of our attention.

Do 90% of state vote outcomes fall within the 90% posterior credible intervals of the state forecasts? This gets at the uncertainty in the model estimates. I use a 90% interval so that there’s room to detect underconfidence as well as overconfidence in the forecasts. In 2008, the model was a bit overconfident. For this year, I’ll be fine with 80% coverage; if it’s much lower than that, I’ll want to revisit some of the model assumptions.

How accurate was the overall electoral vote forecast? And how quickly (if at all) did it narrow in on the actual result? Even if the state-level estimates are good, there might be an error in how the model aggregates those forecasts nationally.

Was there an appropriate amount of uncertainty in the electoral vote forecasts? Since there is only one electoral vote outcome, this will involve calculating the percentage of the campaign during which the final electoral vote was within the model’s 95% posterior credible interval. Accepting the possibility of overconfidence in the state forecasts, this should not fall below 85%-90%.

Finally, how sensitive were the forecasts to the choice of structural prior? Especially if the model is judged to have performed poorly, could a different prior specification have made the difference?

If you can think of any I’ve left off, please feel free to add them in the comments.

Uncategorized

5 Comments

MarkS
October 12, 2012 at 11:46 am

It would be very interesting to see your main “electoral vote forecast” graph with the structural prior replaced by a flat prior. I think it makes sense to use a structural prior, but clearly there is wide latitude in what info should be used in its construction.
William Ockham
October 12, 2012 at 12:01 pm

This may be implicit in your other criteria, but I think that a model that correctly places the states in their relative partisan order (Vermont to Utah) is better, all other things being equal, than one that doesn’t. States don’t “move around” that much, so it is important to identify the “Indiana in 2008” situations. One way to express this would be how well a model predicts a state’s PVI.
P G Vaidya
October 12, 2012 at 1:46 pm

I am not sure I agree with you that on November 6 we will know if the model is right. There are a very large number of models which will give the same set of answers on Nov 6. On the other hand consider a model which is in the year year 2112, we might see as something which remained within a small band of error for a hundred years and yet was a bit wrong this year.
Drew (Post author)
October 12, 2012 at 2:35 pm

MarkS: Yes, I agree. I’ll probably try something like that if it’s not too computationally demanding.

Ockham: Great idea. If the ordering is right, but the overall swing is wrong, at least that’s something.

PG: It’s certainly possible, and I appreciate your instinct to take a longer-term view – it’s what I’m trying to do as well. On the other hand, if I have a model that I’ve tested on two elections, and it only “worked” in one of them, then I think some tinkering and modification would be worth doing.
Nadia Hassan
October 19, 2012 at 9:19 am

Drew, on some occasions (here and twitter), you’ve discussed how your approach ultimately offers its estimate based on the median, whereas Nate looks at the mean. I think a single election is too limited a data point to draw big conclusions about that, but it might be interesting to see whether the median or mean is closer to the final outcome.

Model Checking

5 Comments

Leave a Reply

Twitter

Model Checking

Share:

5 Comments

Leave a Reply

Twitter