Polling Analysis and Election Forecasting

Month: November 2016

The forecasts were wrong. Trump won. What happened?

Cross-posted at Daily Kos Elections

Last week, just before Election Day, we published our final presidential election forecast: Hillary Clinton, 323 electoral votes; Donald Trump, 215. As I wrote when making that prediction, “While it’s possible for Trump to defy the polls and win the election, it is not likely. Our model estimates Trump’s chances at around 12 percent.”

Trump’s Election Day victory, with 306 electoral votes, took us by surprise. My forecast was wrong. It’s time to understand why.

The forecast was based on a statistical model that analyzed nearly 1,400 state-level pre-election public opinion polls, in combination with a set of political and economic “fundamentals” that added information about the election’s historical context. The fundamental factors (which turned out to predict the national vote share very closely) indicated that Clinton faced an uphill climb from the very beginning. In May, I estimated that Clinton’s baseline probability of victory was around 35 percent.

But all summer long, right up to Election Day, the polls told a different story. Pollsters reported that Clinton was consistently ahead, both nationally and in the states, by what were sometimes very large margins. By July, Clinton’s advantage in the polls lifted her chance of winning to 65 percent, and it never fell below that mark. After the first presidential debate, Clinton’s lead over Trump in the state polls was so great that our model gave her a 96 percent chance of victory. And our reading of the polls was not unique: Every other major forecaster also expected Clinton to win (albeit with varying degrees of certainty). It would have taken either a major campaign event, or a major failure of public opinion measurement, for her to lose.

The polling failure was what we got. Late campaign developments like the Comey letter may have affected some voters, but if so, polls still never showed Trump in the lead. In previous elections, the error in the aggregates of the polls typically went both ways, sometimes benefiting the Democrat, and other times benefiting the Republican. This year, the errors were massive, and they almost all went in the direction of Trump.

State-level presidential polls—especially in the swing states—were badly and systematically wrong, by amounts not seen in decades. The polling averages indicated that Clinton would win Florida and North Carolina by 2 percentage points, Pennsylvania and Wisconsin by 5 percentage points, and Michigan by 7 percentage points. Instead, Trump won all five, for a total haul of 90 electoral votes. The state polls were so inaccurate that Trump almost won New Hampshire, where he’d been trailing by 5, and Minnesota, where he’d trailed by 9. Across all states, on average, Trump’s margin of victory was 5 percentage points greater than our polling aggregates expected it to be.

Given this data, no reasonable poll-based presidential forecasting model could have predicted a Trump victory. There was no interpretation of the available public polling data that supported that conclusion. This was not a case of confirmation bias or analysts reading into the data conclusions that they wanted to see. The evidence supporting a Trump victory did not exist.

The miss was not confined to the public polls, which are often considered to be of lower quality than the proprietary research commissioned by parties and campaigns, and never released to the public. Reports suggest that neither the Clinton nor the Trump campaign saw this result coming. Neither did the RNC. Going into Election Day, Trump campaign analysts calculated that they had at best a 30 percent chance of winning.

Some forecasting models did give Donald Trump a higher probability of winning; most notably the FiveThirtyEight model at 29 percent. But the reason why they saw Trump’s chances as being more likely was not because they had a fundamentally more pro-Trump interpretation of the data. Rather, they put less trust in the polls, which increased their uncertainty in the overall outcome of the election in both directions. This widened the range of potential electoral vote outcomes seen as consistent with the data—resulting in their forecast of Clinton’s chance of winning getting pulled back towards 50 percent. No matter the level of uncertainty in the final outcome, every poll-based model’s best guess was that Clinton would win the same set of states totaling 323 electoral votes, and every model was wrong in the same way.

It is not yet known why polls underestimated Trump’s vote share so badly. The polls also overestimated Clinton’s vote share, but not by nearly as much. Survey researchers are already busy investigating different theories. One clue, however, was that there was an unusually large number of survey respondents, all year, who said that they were either undecided or supporting a third-party candidate for president. I mentioned this pattern in my final forecast, and you can see it illustrated in the chart below:


When as many as 12 percent of voters are uncommitted going into Election Day, it makes a big difference if they “break” disproportionately towards one candidate or the other. Nobody knows if there were significant numbers of so-called ”shy” Trump supporters who were uncomfortable telling pollsters they were backing Trump in this uncommitted bloc. But evidence from the exit polls suggests that many Trump voters “broke late,” and decided to support him only at the very last minute. Allowing for this possibility is something that should have contributed more uncertainty to most forecasters’ projections, including our own.

I checked whether the forecasts might have been wrong because one or two polling firms reported especially inaccurate results. That wasn’t the problem. In our database of state-level presidential polls, the two largest contributors were SurveyMonkey and UPI/CVoter, which together accounted for 29 percent of all polls. In many states, half or more of our data came from one of those two firms. I removed all of those polls from the dataset and re-ran the model. The results did not change in any meaningful way.

That so many people were caught off-guard by the election outcome suggests that the polling failure was a symptom of a deeper, industry-wide problem. Survey research is currently struggling through a difficult period of technological change. Fewer people than ever are willing to respond to polls, and those that do respond tend to be non-representative; older and more white than the population as a whole. Differential partisan non-response—in which the partisanship of people agreeing to take polls varies by their level of excitement in the campaign—causes poll results to swing wildly even if opinion is stable. This year, more polls than ever were conducted online, but the quality of online methodologies differs greatly across firms.

Despite these challenges, many media organizations and polling firms chose to undertake the hard work of surveying voters and releasing their results to the public, for free. There isn’t anyone who doesn’t wish the data had been more accurate. But those organizations who made the effort to contribute to public knowledge about the campaign by publishing their results deserve our gratitude and respect. Thank you. What we need in order to avoid a repeat of this surprising outcome in 2020 is not less pre-election polling, but more—by more firms, with different methodologies, and different viewpoints.

Final 2016 Presidential Forecast: Clinton 323, Trump 215

Cross-posted at Daily Kos Elections

Over the course of this presidential campaign, Daily Kos Elections has logged 1,371 state-level presidential polls into our database. All signs point to a Hillary Clinton victory.

Our forecasting model indicates that Clinton is highly likely to win key states including Colorado, Pennsylvania, New Hampshire, Virginia, and Wisconsin. In all five of these states, Clinton has never trailed in our average of the polls—and if she carries all of them, she would win the election over Donald Trump with 273 electoral votes, three more than the 270 required for victory. In addition, our model also favors Clinton in Florida, North Carolina, and Nevada. Together, those states contribute another 50 electoral votes.

That gives us our final prediction: Clinton 323 electoral votes, Trump 215.

Given that the forecast is based almost entirely on public polling data, how much can we trust the accuracy of the polls? As recently as one week ago, Clinton held such a commanding lead that our model placed her chances of winning as high as 96 percent. Since then, the race has tightened, and we currently estimate Clinton’s odds of victory at 88 percent. That’s enough of a change that a large and consistent polling error could make the difference for Trump. But the error would have to be very large, and very consistent. Going into Election Day, Clinton’s average lead in the polls is 3 points in New Hampshire, 4 points in Colorado and Pennsylvania, and 5 points in Wisconsin and Virginia.

Polling is never perfect, but systematic errors across multiple states in the same presidential election are historically not that large, or that common. Instead, the state-level errors form a distribution: In some states, one candidate outperforms the polls, and in other states, the other candidate does better. For example, in 2012, on average, the polls underestimated Obama’s vote share by a small amount; nevertheless, in 22 states, his polling was higher than his eventual vote share. Polling errors are less “correlated” across states than you might expect.

What about the magnitude of the state-level polling errors? Aggregating public polls usually produces forecasts that are very close to the actual outcome, especially in competitive states where pollsters have conducted larger numbers of polls. Again using 2012 as an example, there were 15 close states where a candidate won by 10 points or fewer (counting only the major-party vote). In seven of those states, the polls accurately predicted the margin of victory to within 1 percentage point. In another three states, the polls missed the actual margin of victory by under 2 points, and in four states, the polls were off by between 2 and 3 points. In only one state did the polls miss the margin by more than 3 points. And to reinforce our point above about correlated polling errors, Obama outperformed his polls in eight of the 15 close states; in the other seven states, Romney did better than expected.

So, while it’s possible for Trump to defy the polls and win the election, it is not likely. Our model estimates Trump’s chances at around 12 percent.

Stepping away from the polling data, there are reasons to think that the probability of a Trump victory isn’t even this high. None of these other factors are formally built into our model, and I haven’t analyzed them in any systematic or historical context, but consider everything below here informed conjecture. My Daily Kos colleague Stephen Wolf also examined some of these factors, and others, in a recent post exploring why the polls could be off.

First, our forecasting model takes the public polls essentially at face value: We apply a slight adjustment to polls conducted by partisan pollsters, and we make a few assumptions about how quickly to assimilate new polling data and how much to infer state trends from national trends. But we have no way to account for phenomena like differential partisan nonresponse, which may be responsible for the seemingly large swings in the presidential polls this year. If, contrary to some of the raw polling data, public opinion has been as stable as recent research suggests, then some of the more sophisticated online tracking surveys, like those from YouGov, NBC/SurveyMonkey, and Google—all of which have shown Clinton with a consistent lead—might have it right.

Our model also does not incorporate data on early voting, beyond what is implicitly captured by polls that include respondents who have already voted. Although there is disagreement about how much should be read into early vote totals, one state stands out: Nevada. Heavy Latino turnout in the Nevada early voting period appears to have put a significant dent in Trump’s chances of winning there—a must-win state for him where polls alone suggest he has at least a one-in-three chance of winning.

Related to this, there are a variety of reports indicating a large discrepancy in the size and quality of the Clinton and Trump campaigns’ voter turnout operations. In short, Clinton enjoys a significant advantage. Research suggests that her superior “ground game” could be worth up to 1 to 3 percent of the vote. This will not be picked up in the polls.

Finally, although the “fundamentals” of the presidential election have long been factored out of our forecast in favor of newer polling data, two key structural factors have actually gotten better for Clinton over the course of the campaign: President Obama’s job approval rating is on an upswing, and the national economy is growing at a faster rate than when we first accounted for these factors back in June.

There is one last caveat that gives me pause: The number of voters telling pollsters that they are still undecided, or are intending to vote for a third-party candidate, remains unusually high. We know that these respondents are disproportionately younger, white voters who would otherwise be likely to support Hillary Clinton. But we have no way of knowing for sure how these individuals will vote, or if they will turn out to vote at all. It’s something that I will be looking out for on Tuesday.

Happy Election Day.

Looking for Presidential Election Forecasts?

Please head over to Daily Kos Elections, where you’ll find the implementation of my forecasting model this year. My forecasts are also part of the comparison table at The Upshot at the New York Times, labeled ‘DK’.

In addition to predictions of the presidential race, we are making forecasts of all of the 2016 Senate races, and we’re calculating a complete set of poll-tracking trendlines for every state. Here’s the North Carolina presidential matchup.

For polling trends in all 50 states at a glance, you can also check out the poll tracker page at this site, or my trend detail page, which gives a zoomed-in look at how voter preferences have shifted during the campaign.


Theme by Anders NorenUp ↑