• Evaluating the Forecasting Model

    by  • November 15, 2012 • 7 Comments

    Since June, I’ve been updating the site with election forecasts and estimates of state-level voter preferences based on a statistical model that combines historical election data with the results of hundreds of state-level opinion polls. As described in the article that lays out my approach, the model worked very well when applied to data from the 2008 presidential election. It now appears to have replicated that success in 2012. The model accurately predicted Obama’s victory margin not only on Election Day – but months in advance of the election as well.

    With the election results (mostly) tallied, it’s possible to do a detailed retrospective evaluation of the performance of my model over the course of the campaign. The aim is as much to see where the model went right as where it might have gone wrong. After all, the modeling approach is still fairly new. If some of its assumptions need to be adjusted, the time to figure that out is before the 2016 campaign begins.

    To keep myself honest, I’ll follow the exact criteria for assessing the model that I laid out back in October.

    1. Do the estimates of the state opinion trends make sense?

      Yes. The estimated trendlines in state-level voter preferences appear to pass through the center of the polling data, even in states with relatively few polls. This suggests that the hierarchical design of the model, which borrows information from the polls across states, worked as intended.

      The residuals of the fitted model (that is, the difference between estimates of the “true” level of support for Obama/Romney in a state and the observed poll results) are also consistent with a pattern of random sampling variation plus minor house effects. In the end, 96% of polls fell within the theoretical 95% margin of error; 93% were within the 90% MOE; and 57% were within the 50% MOE.

    2. How close were the state-level vote forecasts to the actual outcomes, over the course of the campaign?

      The forecasts were very close to the truth, even in June. I calculate the mean absolute deviation (MAD) between the state vote forecasts and the election outcomes, on each day of the campaign. In the earliest forecasts, the average error was already as low as 2.2%, and gradually declined to 1.7% by Election Day. (Perfect predictions would produce a MAD of zero.)

      By incorporating state-level polls, the model was able to improve upon the baseline forecasts generated by the Abramowitz Time-for-Change model and uniform swing – but by much less than it did in 2008. The MAD of the state-level forecasts based on the Time-for-Change model alone – with no polling factored in at all – is indicated by the dashed line in the figure. It varied a bit over time, as updated Q2 GDP data became available.

      Why didn’t all the subsequent polling make much difference? The first reason is that the Time-for-Change forecast was already highly accurate: it predicted that Obama would win 52.2% of the major party vote; he actually received 51.4%. The successful track record of this model is the main reason I selected it in the first place. Secondly, state-level vote swings between 2008 and 2012 were very close to uniform. This again left the forecasts with little room for further refinement.

      But in addition to this, voters’ preferences for Obama or Romney were extremely stable this campaign year. From May to November, opinions in the states varied by no more than 2% to 3%, compared to swings of 5% to 10% in 2008. In fact, by Election Day, estimates of state-level voter preferences weren’t much different from where they started on May 1. My forecasting model is designed to be robust to small, short-term changes in opinion, and these shifts were simply not large enough to alter the model’s predictions about the ultimate outcome. Had the model reacted more strongly to changes in the polls – as following the first presidential debate, for example – it would have given the mistaken impression that Obama’s chances of reelection were falling, when in fact they were just as high as ever.

    3. What proportion of state winners were correctly predicted?

      As a result of the accuracy of the prior and the relative stability of voter preferences, the model correctly picked the winner of nearly every state for the entire campaign. The only mistake arose during Obama’s rise in support in September, which briefly moved North Carolina into his column. After the first presidential debate, the model returned to its previous prediction that Romney would win North Carolina. On Election Day, the model went 50-for-50.

    4. Were the competitive states identified early and accurately?

      Yes. Let’s define competitive states as those in which the winner is projected to receive under 53% of the two-party vote. On June 23, the model identified twelve such states: Arizona, Colorado, Florida, Indiana, Iowa, Michigan, Missouri, Nevada, North Carolina, Ohio, Virginia, and Wisconsin. That’s a good list.

    5. Do 90% of the actual state vote outcomes fall within the 90% posterior credible intervals of the state vote forecasts?

      This question addresses whether there was a proper amount of uncertainty in the forecasts, at various points in the campaign. As I noted before, in 2008, the forecasts demonstrated a small degree of overconfidence towards the end of the campaign. The results from the 2012 election show the same tendency. Over the summer, the forecasts were actually a bit underconfident, with 95%-100% of states’ estimated 90% posterior intervals containing the true outcome. But by late October, the model produced coverage rates of just 70% for the nominal 90% posterior intervals.

      As in 2008, the culprit for this problem was the limited number of polls in non-competitive states. The forecasts were not overconfident in the key battleground states where many polls were available, as can be seen in the forecast detail. It was only in states with very few polls – and especially where those polls were systematically in error, as in Hawaii or Tennessee – that the model became misled. A simple remedy would be to conduct more polls in non-competitive states, but it’s not realistic to expect this to happen. Fortunately, overconfidence in non-competitive states does not adversely impact the overall electoral vote forecast. Nevertheless, this remains an area for future development and improvement in my model.

      It’s also worth noting that early in the campaign, when the amount of uncertainty in the state-level forecasts was too high, the model was still estimating a greater than 95% chance that Obama would be reelected. In other words, aggregating a series of underconfident state-level forecasts produced a highly confident national-level forecast.

    6. How accurate was the overall electoral vote forecast?

      The final electoral vote was Obama 332, Romney 206, with Obama winning all of his 2008 states, minus Indiana and North Carolina. My model first predicted this outcome on June 23, and then remained almost completely stable through Election Day. The accuracy of my early forecast, and its steadiness despite short-term changes in public opinion, is possibly the model’s most significant accomplishment.

      In contrast, the electoral vote forecasts produced by Nate Silver at FiveThirtyEight hovered around 300 through August, peaked at 320 before the first presidential debate, then cratered to 283 before finishing at 313. The electoral vote estimator of Sam Wang at the Princeton Election Consortium demonstrated even more extreme ups and downs in response to the polls.

    7. Was there an appropriate amount of uncertainty in the electoral vote forecasts?

      This is difficult to judge. On one hand, since many of the state-level forecasts were overconfident, it would be reasonable to conclude that the electoral vote forecasts were overconfident as well. On the other hand, the actual outcome – 332 electoral votes for Obama – fell within the model’s 95% posterior credible interval at every single point of the campaign.

    8. Finally, how sensitive were the forecasts to the choice of structural prior?

      Given the overall solid performance of the model – and that testing out different priors would be extremely computationally demanding – I’m going to set this question aside for now. Suffice to say, Florida, North Carolina, and Virginia were the only three states in which the forecasts were close enough to 50-50 that the prior specification would have made much difference. And even if Obama had lost Florida and Virginia, he still would have won the election. So this isn’t something that I see as an immediate concern, but I do plan on looking into it before 2016.

    Final Result: Obama 332, Romney 206

    by  • November 9, 2012 • 12 Comments

    The results are in: Obama wins all of his 2008 states, minus Indiana and North Carolina, for 332 electoral votes. This is exactly as I predicted on Tuesday morning – and as I’ve been predicting (albeit with greater uncertainty) since June. Not bad! The Atlantic Wire awarded me a Gold Star for being one of “The Most Correct Pundits In All the Land”. There were also nice write-ups in The Chronicle of Higher Education, BBC News Magazine, Atlanta Journal-Constitution and the LA Times, among others. Thanks to everyone who has visited the site, participated in the comments, and offered their congratulations. I really appreciate it.

    I’m still planning a complete assessment of the performance of the forecasting model, along the lines I described a few weeks ago. But in the meantime, a few quick looks at how my Election Day predictions stacked up against the actual state-level vote outcomes. First, a simple scatterplot of my final predictions versus each state’s election result. Perfect predictions will fall along the 45-degree line. If a state is above the 45-degree line, then Obama performed better than expected; otherwise he fared worse.

    Interestingly, in most of the battleground states, Obama did indeed outperform the polls; suggesting that a subset of the surveys in those states were tilted in Romney’s favor, just as I’d suspected. Across all 50 states, however, the polls were extremely accurate. The average difference between the actual state vote outcomes and the final predictions of my model was a miniscule 0.03% towards Obama.

    My final estimates predicted 19 states within 1% of the truth, with a mean absolute deviation of 1.7%, and a state-level RMSE of 2.3% (these may change slightly as more votes are counted). Other analysts at the CFAR blog and Margin of Error compared my estimates to those of Nate Silver, Sam Wang, Simon Jackman, and Josh Putnam, and found they did very well. All in all, a nice round of success for us “quants”.

    Unsurprisingly, my model made much better predictions where more polls had been fielded! Here I’ll plot the difference between Obama’s share of the two-party vote in each state, and my final prediction, as a function of the number of polls in the state since May 1. Again, positive values indicate states where Obama did better than expected.

    For minimizing the error in my forecasts, the magic number of polls per state appears to be around 25. That’s really not a lot; and I’m hopeful that we can get to at least this number in 2016. It’s a bit concerning, though, that there were about 25% fewer state-level presidential polls this year, compared to 2008.

    Recently there have been some complaints among pollsters – most notably Gallup’s Frank Newport – that survey aggregators (like me) “don’t exist without people who are out there actually doing polls,” and that our work threatens to dissuade survey organizations from gathering these data in the first place. My view is slightly different. I’d say that working together, we’ve proven once again that public opinion research is a valuable source of information for understanding campaign dynamics and predicting election outcomes. There’s no reason why the relationship shouldn’t be one of mutual benefit, rather than competition or rivalry. In a similar manner, our analyses supplement – not replace – more traditional forms of campaign reporting. We should all be seen as moving political expertise forward, in an empirical and evidence-based way.

    Election Day Forecast: Obama 332, Romney 206

    by  • November 6, 2012 • 91 Comments

    With the last set of polls factored into the model, my final prediction is Obama to win 332 electoral votes, with 206 for Romney. This is both the median and the modal outcome in my electoral vote simulation, and corresponds to Obama winning all of his 2008 states except Indiana and North Carolina.

    The four closest states – and therefore the most difficult to predict – are Florida, North Carolina, Virginia, and Colorado. Of these, my model only expects Romney to win North Carolina; but Florida is a true toss-up, with just a 60% chance of Obama victory. I would not be surprised if Florida ended up going for Romney. If that happens, Obama would win 303 electoral votes, which is the second-most likely scenario in my simulation. The third-most likely scenario is that Obama wins 347 electoral votes, picking up North Carolina in addition to Florida.

    It’s been interesting to watch the forecasts of other poll watchers converge on the 332 estimate. Sam Wang, at the Princeton Election Consortium, also sees 332 as the modal outcome. So does Simon Jackman at the Huffington Post, and Josh Putnam at FHQ. Nate Silver, at his FiveThirtyEight blog, reports the mean of his electoral vote simulation at 313 – effectively splitting the difference on Florida, which he currently rates a 50.3% chance of an Obama win. But his most likely outcome is still Obama 332, followed by 303 and 347, just like me. Update: both Wang and Jackman revised their forecasts slightly downward this afternoon, based on late-arriving polls.

    There will be plenty of opportunities to evaluate all of these forecasts once the election results are known. I’ve already laid out the standards I’ll be using to check my own model. This is how quantitative election forecasting can make progress, and hopefully work even better next time.

    I’ll add, though, that on the eve of the election, averaging the polls, or running them through any sort of sensible model, isn’t all that hard. We are all using the same data (more or less) and so it doesn’t surprise me that we’re all reaching similar conclusions. The real challenge is producing meaningful and accurate forecasts early in the campaign. My model is designed to be robust to short-term fluctuations in the polls, and converge in a stable and gradual manner to the final, Election Day estimates. It appears that in this regard, the model has worked as intended.

    But from a broader perspective, my model has been predicting that Obama will win 332 electoral votes – give or take – since June. If all of us are correct today, the next question to ask is when each model arrived at the ultimate outcome. That’s a big if, though. Let’s start with how the votes come in tonight, and go from there.

    Final Estimates Tomorrow Morning

    by  • November 5, 2012 • 40 Comments

    I entered nearly 50 new state polls in the most recent model update, posted earlier today. There have been over 30 additional polls released since then. I’ll wait a few more hours to see if any more come out, then run the model one more time, overnight. My final estimates will be ready in the morning.

    In the meantime, you might have noticed that my EV forecast for Obama inched downward for the first time in weeks, from 332 to 326. That’s the median of my election simulations, but it doesn’t correspond to a particularly likely combination of state-level outcomes. Instead, it reflects the declining probability that Obama will win Florida (now essentially a 50-50 proposition), and Obama’s continuing deficit in North Carolina. I’ve updated the title on the chart in the banner to make this clear.

    Depending on how things go with the final run, I’ll keep updating the chart as I have been, using the median. But I’ll also create a more true-to-life forecast, based on assigning each state to its most likely outcome. With Florida (and its 29 electoral votes) right on the knife edge, this will either be Obama 332-206 if the model projects an Obama victory there, or Obama 303-235 if the model shows Obama behind. I’ll also have all sorts of other tables and charts ready to go for comparing the election results as they’re announced.

    Pollsters May Be Herding

    by  • November 5, 2012 • 26 Comments

    The accuracy of my election forecasts depends on the accuracy of the presidential polls. As such, a major concern heading into Election Day is the possibility that polling firms, out of fear of being wrong, are looking at the results of other published surveys and weighting or adjusting their own results to match. If pollsters are engaging in this sort of herding behavior – and, as a consequence, converging on the wrong estimates of public opinion – then there is danger of the polls becoming collectively biased.

    To see whether this is happening, I’ll plot the absolute value of the state polls’ error, over time. (The error is the difference between a poll’s reported proportion supporting Obama, and my model’s estimate of the “true” population proportion.) Herding would be indicated by a decline in the average survey error towards zero – representing no difference from the consensus mean – over the course of the campaign. This is exactly what we find. Although there has always been a large amount of variation in the polls, the underlying trend – as shown by the lowess smoother line, in blue – reveals that the average error in the polls started at 1.5% in early May, but is now down to 0.9%.

    How worried do we need to be? Herding around the wrong value is potentially much worse than any one or two firms having an unusual house effect. But even if the variance of the polls is decreasing, they might still have the right average. An alternative explanation for this pattern could be an increase in sample sizes (resulting in lower sampling variability), but this hasn’t been the case. Unfortunately, there weren’t enough polls to tell whether the pattern was stronger in more frequently-polled states, or if particular firms were more prone to follow the pack. Hopefully, this minor trend won’t mean anything, and the estimates will be fine. We’ll know soon.

    Another Look at Survey Bias

    by  • November 2, 2012 • 102 Comments

    Questions continue to be raised about the accuracy of the polls. Obviously, in just a few more days, we’ll know which polls were right (on average) and which were wrong. But in the meantime, it’s useful to understand how the polls are – at the very least – different from one another, and form a baseline set of expectations to which we can compare the election results on Tuesday. The reason this question takes on special urgency now is that there’s essentially no time left in the campaign for preferences to change any further: if the state polls are right, then Obama is almost certain to be reelected.

    In previous posts, I’ve looked at both house effects, and error distributions (twice!), but I want to return to this one more time, because it gets to the heart of the debate between right-leaning and left-leaning commentators over the trustworthiness of the polls.

    A relatively small number of survey firms have conducted a majority of the state polls, and therefore have a larger influence on the trends and forecasts generated by my model. Nobody disputes that there have been evident, systematic differences in the results of these major firms: some leaning more pro-Romney, others leaning more pro-Obama. As I said at the outset, we’ll know on Election Day who’s right and wrong.

    But here’s a simple test. There have been hundreds of smaller organizations who have released fewer than a half-dozen polls each. Most have only released a single poll. We can’t reliably estimate the house effects for all of these firms individually. However, we can probably safely assume that in aggregate they aren’t all ideologically in sync – so that whatever biases they have will all cancel out when pooled together. We can then compare the overall error distribution of the smaller firms’ surveys to the error distributions of the larger firms’ surveys. (The survey error is simply the difference between the proportion supporting Obama in a poll, and my model’s estimate of the “true” proportion on that state and day.)

    If the smaller firms’ errors are distributed around zero, then the left-leaning firms are probably actually left-leaning, and the right-leaning firms are probably actually right-leaning, and this means that they’ll safely cancel each other out in my results, too. On the other hand, if the smaller firms’ error distribution matches either the left-leaning or the right-leaning firms’ error distribution, then it’s more likely the case that those firms aren’t significantly biased after all, and it’s the other side’s polls that are missing the mark.

    What do we find? This set of kernel density plots (smoothed histograms) shows the distribution of survey errors among the seven largest survey organizations, and in grey, the distribution of errors among the set of smaller firms. The smaller firms’ error distribution matches that of Quinnipiac, SurveyUSA, YouGov, and PPP. The right-leaning firms – Rasmussen, Gravis Marketing, and ARG – are clearly set apart on the pro-Romney side of the plot.

    If, on Election Day, the presidential polls by Quinnipiac, SurveyUSA, YouGov, and PPP prove to be accurate, then the polls by Rasmussen, Gravis Marketing, and ARG will all have been underestimating Obama’s level of support by 1.5% consistently, throughout the campaign. Right now, assuming zero overall bias, Florida is 50-50. The share of Florida polls conducted by Rasmussen, Gravis Marketing, and ARG? 20%. Remove those polls from the dataset, and Obama’s standing improves.

    Four days to go.

    Can We Trust the Polls?

    by  • October 28, 2012 • 107 Comments

    If you believe the polls, Obama is in good shape for reelection. And my model’s not the only one showing this: you’ll find similar assessments from a range of other poll-watchers, too. The lead is clear enough that The New Republic’s Nate Cohn recently wrote, “If the polls stay where they are, which is the likeliest scenario, Obama would be a heavy favorite on Election Day, with Romney’s odds reduced to the risk of systemic polling failure.”

    What would “systemic” polling failure look like? In this case, it would mean that not only are some of the polls overstating Obama’s level of support; but that most – or even all – of the polls have been consistently biased in Obama’s favor. If this is happening, we’ll have no way to know until Election Day. (Of course, it’s just as likely that the polls are systematically underestimating Obama’s vote share, but then Democrats have even less to be worried about.)

    A failure of this magnitude would be major news. It would also be a break with recent history. In 2000, 2004, and 2008, presidential polls conducted just before Election Day were highly accurate, according to studies by Michael Traugott here and here; Pickup and Johnston; and Costas Panagopoulos. My own model in 2008 produced state-level forecasts based on the polls that were accurate to within 1.4% on Election Day, and 0.4% in the most competitive states.

    Could this year be different? Methodologically, survey response rates have fallen below 10%, but it’s not evident how this necessarily helps Obama. Surveys conducted using automatic dialers (rather than live interviewers) often have even lower response rates, and are prohibited from calling cell phones – but, again, this tends to produce a pro-Republican – not pro-Democratic – lean. And although there are certainly house effects in the results of different polling firms, it seems unlikely that Democratic-leaning pollsters would intentionally distort their results to such an extent that they discredit themselves as reputable survey organizations.

    My analysis has shown that despite these potential concerns, the state polls appear to be behaving almost exactly as we should expect. Using my model as a baseline, 54% of presidential poll outcomes are within the theoretical 50% margin of error; 93% are within the 90% margin of error, and 96% are within the 95% margin of error. This is consistent with a pattern of random sampling plus minor house effects.

    Nevertheless, criticisms of the polls – and those of us who are tracking them – persist. One of the more creative claims about why the polling aggregators might be wrong this year comes from Sean Trende of RealClearPolitics and Jay Cost of The Weekly Standard. Their argument is that the distribution of survey errors has been bimodal – different from the normal distribution of errors produced by simple random sampling. If true, this would suggest that pollsters are envisioning two distinct models of the electorate: one more Republican, the other more Democratic. Presuming one of these models is correct, averaging all the polls together – as I do, and as does the Huffington Post and FiveThirtyEight – would simply contaminate the “good” polls with error from the “bad” ones. Both Trende and Cost contend the “bad” polls are those that favor Obama.

    The problem with this hypothesis is that even if it was true (and the error rates suggest it’s not), there would be no way to observe evidence of bimodality in the polls unless the bias was way larger than anybody is currently claiming. The reason is because most of the error in the polls will still be due to random sampling variation, which no pollster can avoid. To see this, suppose that half the polls were biased 3% in Obama’s favor – a lot! – while half were unbiased. Then we’d have two separate distributions of polls: the unbiased group (red), and the biased group (blue), which we then combine to get the overall distribution (black). The final distribution is pulled to the right, but it still only has one peak.

    Of course, it’s possible that in any particular state, with small numbers of polls, a histogram of observed survey results might just happen to look bimodal. But this would have to be due to chance alone. To conclude from it that pollsters are systematically cooking the books, only proves that apopheniathe experience of seeing meaningful patterns or connections in random or meaningless data – is alive and well this campaign season.

    The election is in a week. We’ll all have a chance to assess the accuracy of the polls then.

    Update: I got a request to show the actual error distributions in the most frequently-polled states. All errors are calculated as a survey’s reported Obama share of the major-party vote, minus my model’s estimate of the “true” value on the day and state of that survey. Positive errors indicate polls that were more pro-Obama, negative errors are for polls that were more pro-Romney. To help guide the eye, I’ve overlaid kernel density plots (histogram smoothers) in blue. The number of polls per state are in parentheses.

    It may also help to see the overall distribution of errors across the entire set of state polls. After all, if there is “bimodality” then why should it only show up in particular states? The distribution looks fine to me.