Polling Analysis and Election Forecasting

The forecasts were wrong. Trump won. What happened?

Cross-posted at Daily Kos Elections

Last week, just before Election Day, we published our final presidential election forecast: Hillary Clinton, 323 electoral votes; Donald Trump, 215. As I wrote when making that prediction, “While it’s possible for Trump to defy the polls and win the election, it is not likely. Our model estimates Trump’s chances at around 12 percent.”

Trump’s Election Day victory, with 306 electoral votes, took us by surprise. My forecast was wrong. It’s time to understand why.

The forecast was based on a statistical model that analyzed nearly 1,400 state-level pre-election public opinion polls, in combination with a set of political and economic “fundamentals” that added information about the election’s historical context. The fundamental factors (which turned out to predict the national vote share very closely) indicated that Clinton faced an uphill climb from the very beginning. In May, I estimated that Clinton’s baseline probability of victory was around 35 percent.

But all summer long, right up to Election Day, the polls told a different story. Pollsters reported that Clinton was consistently ahead, both nationally and in the states, by what were sometimes very large margins. By July, Clinton’s advantage in the polls lifted her chance of winning to 65 percent, and it never fell below that mark. After the first presidential debate, Clinton’s lead over Trump in the state polls was so great that our model gave her a 96 percent chance of victory. And our reading of the polls was not unique: Every other major forecaster also expected Clinton to win (albeit with varying degrees of certainty). It would have taken either a major campaign event, or a major failure of public opinion measurement, for her to lose.

The polling failure was what we got. Late campaign developments like the Comey letter may have affected some voters, but if so, polls still never showed Trump in the lead. In previous elections, the error in the aggregates of the polls typically went both ways, sometimes benefiting the Democrat, and other times benefiting the Republican. This year, the errors were massive, and they almost all went in the direction of Trump.

State-level presidential polls—especially in the swing states—were badly and systematically wrong, by amounts not seen in decades. The polling averages indicated that Clinton would win Florida and North Carolina by 2 percentage points, Pennsylvania and Wisconsin by 5 percentage points, and Michigan by 7 percentage points. Instead, Trump won all five, for a total haul of 90 electoral votes. The state polls were so inaccurate that Trump almost won New Hampshire, where he’d been trailing by 5, and Minnesota, where he’d trailed by 9. Across all states, on average, Trump’s margin of victory was 5 percentage points greater than our polling aggregates expected it to be.

Given this data, no reasonable poll-based presidential forecasting model could have predicted a Trump victory. There was no interpretation of the available public polling data that supported that conclusion. This was not a case of confirmation bias or analysts reading into the data conclusions that they wanted to see. The evidence supporting a Trump victory did not exist.

The miss was not confined to the public polls, which are often considered to be of lower quality than the proprietary research commissioned by parties and campaigns, and never released to the public. Reports suggest that neither the Clinton nor the Trump campaign saw this result coming. Neither did the RNC. Going into Election Day, Trump campaign analysts calculated that they had at best a 30 percent chance of winning.

Some forecasting models did give Donald Trump a higher probability of winning; most notably the FiveThirtyEight model at 29 percent. But the reason why they saw Trump’s chances as being more likely was not because they had a fundamentally more pro-Trump interpretation of the data. Rather, they put less trust in the polls, which increased their uncertainty in the overall outcome of the election in both directions. This widened the range of potential electoral vote outcomes seen as consistent with the data—resulting in their forecast of Clinton’s chance of winning getting pulled back towards 50 percent. No matter the level of uncertainty in the final outcome, every poll-based model’s best guess was that Clinton would win the same set of states totaling 323 electoral votes, and every model was wrong in the same way.

It is not yet known why polls underestimated Trump’s vote share so badly. The polls also overestimated Clinton’s vote share, but not by nearly as much. Survey researchers are already busy investigating different theories. One clue, however, was that there was an unusually large number of survey respondents, all year, who said that they were either undecided or supporting a third-party candidate for president. I mentioned this pattern in my final forecast, and you can see it illustrated in the chart below:

undecided_2016

When as many as 12 percent of voters are uncommitted going into Election Day, it makes a big difference if they “break” disproportionately towards one candidate or the other. Nobody knows if there were significant numbers of so-called ”shy” Trump supporters who were uncomfortable telling pollsters they were backing Trump in this uncommitted bloc. But evidence from the exit polls suggests that many Trump voters “broke late,” and decided to support him only at the very last minute. Allowing for this possibility is something that should have contributed more uncertainty to most forecasters’ projections, including our own.

I checked whether the forecasts might have been wrong because one or two polling firms reported especially inaccurate results. That wasn’t the problem. In our database of state-level presidential polls, the two largest contributors were SurveyMonkey and UPI/CVoter, which together accounted for 29 percent of all polls. In many states, half or more of our data came from one of those two firms. I removed all of those polls from the dataset and re-ran the model. The results did not change in any meaningful way.

That so many people were caught off-guard by the election outcome suggests that the polling failure was a symptom of a deeper, industry-wide problem. Survey research is currently struggling through a difficult period of technological change. Fewer people than ever are willing to respond to polls, and those that do respond tend to be non-representative; older and more white than the population as a whole. Differential partisan non-response—in which the partisanship of people agreeing to take polls varies by their level of excitement in the campaign—causes poll results to swing wildly even if opinion is stable. This year, more polls than ever were conducted online, but the quality of online methodologies differs greatly across firms.

Despite these challenges, many media organizations and polling firms chose to undertake the hard work of surveying voters and releasing their results to the public, for free. There isn’t anyone who doesn’t wish the data had been more accurate. But those organizations who made the effort to contribute to public knowledge about the campaign by publishing their results deserve our gratitude and respect. Thank you. What we need in order to avoid a repeat of this surprising outcome in 2020 is not less pre-election polling, but more—by more firms, with different methodologies, and different viewpoints.

5 Comments

  1. John Drew

    I’m amazed that you don’t even entertain the idea of partisan bias among the pollsters. I saw one of the data guys from the Trump campaign, Brad Parscale, clearly explain how he foresaw the upcoming Trump victory through his tracking of undecided voters. The folks at Rasmussen also saw undecided voters breaking to Trump. This wasn’t some mysterious trend to people watching the election closely…the numbers were there to predict a Trump victory. The issue is that our biased, liberal media was entirely comfortable telling a story regarding an overwhelming Hillary victory.

  2. Peter

    Thank you for the post-mortem. I am afraid I don’t buy it.

    The fact there were true undecideds, who ‘broke late’ so overwhemingly to become Trump supporters, is a very unlikely statistical event. Instead, I suggest that the same reason that prevented these voters telling pollsters they were voting for Trump or had a secret sympathy for him actually also prevented them from facing the fact to themselves. They deferred this self-realization, in the same way that a person may defer getting a medical test done because they do not want to face the possibility of what it may reveal about themselves.
    What would the most rational behavior for such willingly self-delusional individuals be? It would be to claim they made up their minds at the very last minute – this allows them to continue the fiction that they were indeed undecided – a fiction provided to exit pollsters but more importantly, to themselves. It minimizes the amount of time for self-scrutiny and living with this knowledge.
    It would be interesting to analyze why such undecideds broke for Trump. It could be the Comey letter, but the timing of the letter did not coincide with anything remotely like the magnitude of the shift in the undecideds.

    Rather, and here I enter in the realm of speculation, could it be that the reasons they were leaning towards Trump had nothing to do with the major issues of party or economy, but the implicit dog whistles associated with his campaign, the acknowledgement of which to themselves may perhaps tinker with their own self-image?

  3. John Drew

    I think Occam’s razor applies here. It is much simpler to assume that a media establishment that we absolutely know is biased against conservatives and the Republican party behaved in an irresponsible manner.

    At any rate, here is a post from an unbiased analyst who saw substantial undecided voters ready to break for Trump based on the Rasmussen sub-tabs. http://www.americanthinker.com/blog/2016/11/bombshell_break_for_trump_buried_in_the_latest_rasmussen_poll.html

    As I pointed out earlier, the numbers that indicated a Trump victory weren’t all that mysterious or difficult to dig up. The problem is that the leftists who dominate the media and the mainstream polling services had no incentive to accurately report what was going on to the electorate.

  4. Occam's Stubble

    Your own formula says that it is based on the following factors:

    GDP growth
    Incumbent approval rating
    Whether or not the incumbent has been in office for two terms.

    Obama was in office for two terms. That’s a big hit for the Democrats.
    GDP growth was anemic.
    Obama’s popularity, whatever he had, could not overcome the other two.

    Next time, stop relying on polls of the candidates and use your own formula.

  5. Ryan M. Ferris

    Looking at this issue from 18 months forward, I don’t think there have been strategic solutions provided to the presumed political polling inaccuracies of GE 2016. There have been ongoing development in storage, hardware, machine learning, AI. But will our predictions for 2018 midterms be any more accurate than GE 2016?

    I believe there are deep and widespread structural problems in electoral analysis. Some of them are being addressed in hindsight. See http://www.pewresearch.org/2018/02/15/political-data-in-voter-files/ . However, the accuracy and predictability problems of profiling voter intent, commitment, authenticity of (potentially) ~250M ballots will remain with us. So will the problems of disenfranchisement, “red-mapping” (redistricting), vulnerable and improperly serviced county voting technology, etc.

    I see no “cultural evidence” that America is becoming more politically or culturally homogeneous. Actually, I see individuation and narcissism as the “Weltanschauung” of the body politic in America. The body politic in America is a kind of “balkanized” series of legions swung by hysteric news and social media cycles. Such an environment won’t make elections easier to predict. I predict the same polling disaster for 2018 midterms as for GE2016. I know only this: Those who hire the best mathematicians and data analysts will win. Teach all your children to code in R.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2024 VOTAMATIC

Theme by Anders NorenUp ↑