VOTAMATIC

Announcing the launch of the Civiqs Results Dashboard

Drew — Wed, 14 Mar 2018 17:28:52 +0000

Today, Civiqs is announcing the release of its public opinion Results Dashboard: an interactive website with data, charts, and tables from Civiqs’ daily polling of political attitudes all across America.

Civiqs is a polling and data analytics firm that conducts public opinion research online. Since 2013, Civiqs has fielded over half a million scientific research surveys and collected more than seven million responses to survey questions. (More background on Civiqs, and its relationship to Kos Media, is in this post from Markos Moulitsas. Civiqs has a separate team and is a distinct firm from Daily Kos but is owned by the same parent company.)

For the first time, Civiqs is making the results of its polls open to the public. The results dashboard is updated daily with current measures of President Trump’s job approval rating, the 2018 generic House ballot, favorable ratings of the Democratic and Republican parties, attitudes towards gun control, and much more. All of the results can be filtered and tabulated by age, race, gender, education, and party identification.

I’m the Director and Chief Scientist at Civiqs, and I’m extremely excited to bring this data to the public. Before founding Civiqs, I was a Professor of Political Science at Emory University; and before that, a pollster with firms in Washington, DC, and California. My academic research was largely focused on improving the measurement of public opinion, especially as it trended over time, and on forecasting attitudes into the future. Civiqs applies and builds on that research.

So much of the political debate these days is about how events and actions will play out in public opinion. And there’s not a lot of reliable polling available to answer that question. Often we find ourselves dependent on sporadic survey releases with inconclusive results, or polls put out by interested parties, and reporting that cherry-picks the largest outliers to tell misleading stories about what’s really happening.

The Civiqs research methodology represents what we believe is a better way to track public opinion on the political issues that matter most: with daily polls that detect shifts in attitudes immediately; and methods of data analysis that give us reliable, granular insight into the views of key demographic and partisan subgroups.

Civiqs polls on a wide range of topics, all across the United States, all the time. With results updated daily, the Results Dashboard can reveal shifts in public opinion both large and small, at the topline level and within small subgroups, as they happen. We optimized the Civiqs trendline results model to identify when changes occur—but to not overreact to short-lived, random deviations in the data. Oftentimes, public opinion remaining stable is just as interesting as when it changes—especially when events that seem important at the time have no impact on public perceptions whatsoever.

Here’s an example. Civiqs has been tracking attitudes towards gun control for over three years, starting in January 2015. At that time, Americans were evenly divided over support for stricter gun control laws. Shootings in Charleston in June 2015 and San Bernardino in December 2015 did little to affect public opinion.

But by the Pulse Nightclub shooting in Orlando in June 2016, Civiqs found that attitudes had begun to shift. The immediate reaction to the Orlando shooting was an increase in support for gun control legislation of about 3 percent. The Las Vegas shooting in October 2017 pushed support for gun control even higher. And after last month’s mass shooting at Stoneman Douglas High School in Parkland, Florida, support for stricter gun control measures surged by 8 percentage points more.

Using the “Refine By” options on the results page, you can filter the results even more finely and see that most of the increases in support for gun control after the Parkland shooting occurred among Independents and Republicans—especially those with higher levels of education. Remarkably, since January 2015, Republican women with postgraduate education have shifted from being opposed to stronger gun control laws by a 65 percent margin to being 3 points in favor of those laws, directly following the shooting.

How does Civiqs polling work? Every day, Civiqs surveys thousands of Americans about their opinions on politics, news, and current affairs. The respondents are sampled scientifically from Civiqs’ online, opt-in survey panel. To select panelists for interviewing, Civiqs draws a representative random sample of individuals from a registered voter file. Those people are matched to demographically and geographically similar individuals in the Civiqs panel. To achieve an accurate representation of the population, panel members from groups who are underrepresented in the Civiqs panel are sampled more frequently, and those from groups who are overrepresented in the panel are sampled less frequently.

Selected panelists are notified by email and complete the surveys in a web browser or smartphone at civiqs.com. Civiqs aggregates the results and applies a specialized statistical procedure that calculates trendlines, and models the data to be representative of the underlying population. (Technically, it is a Bayesian dynamic linear model with poststratification.) The results are published at civiqs.com/results, and updated daily.

The Civiqs survey panel represents all regions of the United States, all different groups, and every position on the ideological spectrum. Nobody is left out. The Civiqs panel is open to everyone who wants to join—Democrats, Republicans, Independents—everyone. The opinions of every single survey participant—on every side of every issue—are important.

Civiqs is focused above all on accuracy. The survey methodology implemented by Civiqs’ software is scalable, repeatable, and fully automated. Online polling methods have been studied extensively and been found to generate accurate results when researchers follow a series of best practices in sampling, user experience, interview design, and statistical modeling. Every Civiqs survey is conducted and analyzed in exactly the same way, which eliminates many of the errors and biases that arise in traditional polls. Response rates to Civiqs polls are more than double the industry average. Most importantly, Civiqs never does any ad-hoc weighting or adjustment of its results on a question-by-question or day-to-day basis.

I invite you to join in! When you sign up to take polls with Civiqs, you will receive periodic emails—about once every few weeks—when a new poll is ready. Most surveys are shorter than eight questions and take less than a minute to answer. You can even answer a poll quickly on your smartphone. Part of what makes Civiqs polls accurate is how easy they are to complete! Of course, you are welcome to unsubscribe at any time.

Civiqs will update the Results Dashboard each day with the latest survey results. Some questions will provide complete breakdowns of the results by demographic and geographic subgroups, in both the charts and crosstabs. Other questions will display the topline results only, with more information available to Dashboard subscribers. I hope you find the Dashboard to be a useful, informative, and detailed way to follow public opinion.

The forecasts were wrong. Trump won. What happened?

Drew — Wed, 16 Nov 2016 21:37:44 +0000

Cross-posted at Daily Kos Elections

Last week, just before Election Day, we published our final presidential election forecast: Hillary Clinton, 323 electoral votes; Donald Trump, 215. As I wrote when making that prediction, “While it’s possible for Trump to defy the polls and win the election, it is not likely. Our model estimates Trump’s chances at around 12 percent.”

Trump’s Election Day victory, with 306 electoral votes, took us by surprise. My forecast was wrong. It’s time to understand why.

The forecast was based on a statistical model that analyzed nearly 1,400 state-level pre-election public opinion polls, in combination with a set of political and economic “fundamentals” that added information about the election’s historical context. The fundamental factors (which turned out to predict the national vote share very closely) indicated that Clinton faced an uphill climb from the very beginning. In May, I estimated that Clinton’s baseline probability of victory was around 35 percent.

But all summer long, right up to Election Day, the polls told a different story. Pollsters reported that Clinton was consistently ahead, both nationally and in the states, by what were sometimes very large margins. By July, Clinton’s advantage in the polls lifted her chance of winning to 65 percent, and it never fell below that mark. After the first presidential debate, Clinton’s lead over Trump in the state polls was so great that our model gave her a 96 percent chance of victory. And our reading of the polls was not unique: Every other major forecaster also expected Clinton to win (albeit with varying degrees of certainty). It would have taken either a major campaign event, or a major failure of public opinion measurement, for her to lose.

The polling failure was what we got. Late campaign developments like the Comey letter may have affected some voters, but if so, polls still never showed Trump in the lead. In previous elections, the error in the aggregates of the polls typically went both ways, sometimes benefiting the Democrat, and other times benefiting the Republican. This year, the errors were massive, and they almost all went in the direction of Trump.

State-level presidential polls—especially in the swing states—were badly and systematically wrong, by amounts not seen in decades. The polling averages indicated that Clinton would win Florida and North Carolina by 2 percentage points, Pennsylvania and Wisconsin by 5 percentage points, and Michigan by 7 percentage points. Instead, Trump won all five, for a total haul of 90 electoral votes. The state polls were so inaccurate that Trump almost won New Hampshire, where he’d been trailing by 5, and Minnesota, where he’d trailed by 9. Across all states, on average, Trump’s margin of victory was 5 percentage points greater than our polling aggregates expected it to be.

Given this data, no reasonable poll-based presidential forecasting model could have predicted a Trump victory. There was no interpretation of the available public polling data that supported that conclusion. This was not a case of confirmation bias or analysts reading into the data conclusions that they wanted to see. The evidence supporting a Trump victory did not exist.

The miss was not confined to the public polls, which are often considered to be of lower quality than the proprietary research commissioned by parties and campaigns, and never released to the public. Reports suggest that neither the Clinton nor the Trump campaign saw this result coming. Neither did the RNC. Going into Election Day, Trump campaign analysts calculated that they had at best a 30 percent chance of winning.

Some forecasting models did give Donald Trump a higher probability of winning; most notably the FiveThirtyEight model at 29 percent. But the reason why they saw Trump’s chances as being more likely was not because they had a fundamentally more pro-Trump interpretation of the data. Rather, they put less trust in the polls, which increased their uncertainty in the overall outcome of the election in both directions. This widened the range of potential electoral vote outcomes seen as consistent with the data—resulting in their forecast of Clinton’s chance of winning getting pulled back towards 50 percent. No matter the level of uncertainty in the final outcome, every poll-based model’s best guess was that Clinton would win the same set of states totaling 323 electoral votes, and every model was wrong in the same way.

It is not yet known why polls underestimated Trump’s vote share so badly. The polls also overestimated Clinton’s vote share, but not by nearly as much. Survey researchers are already busy investigating different theories. One clue, however, was that there was an unusually large number of survey respondents, all year, who said that they were either undecided or supporting a third-party candidate for president. I mentioned this pattern in my final forecast, and you can see it illustrated in the chart below:

When as many as 12 percent of voters are uncommitted going into Election Day, it makes a big difference if they “break” disproportionately towards one candidate or the other. Nobody knows if there were significant numbers of so-called ”shy” Trump supporters who were uncomfortable telling pollsters they were backing Trump in this uncommitted bloc. But evidence from the exit polls suggests that many Trump voters “broke late,” and decided to support him only at the very last minute. Allowing for this possibility is something that should have contributed more uncertainty to most forecasters’ projections, including our own.

I checked whether the forecasts might have been wrong because one or two polling firms reported especially inaccurate results. That wasn’t the problem. In our database of state-level presidential polls, the two largest contributors were SurveyMonkey and UPI/CVoter, which together accounted for 29 percent of all polls. In many states, half or more of our data came from one of those two firms. I removed all of those polls from the dataset and re-ran the model. The results did not change in any meaningful way.

That so many people were caught off-guard by the election outcome suggests that the polling failure was a symptom of a deeper, industry-wide problem. Survey research is currently struggling through a difficult period of technological change. Fewer people than ever are willing to respond to polls, and those that do respond tend to be non-representative; older and more white than the population as a whole. Differential partisan non-response—in which the partisanship of people agreeing to take polls varies by their level of excitement in the campaign—causes poll results to swing wildly even if opinion is stable. This year, more polls than ever were conducted online, but the quality of online methodologies differs greatly across firms.

Despite these challenges, many media organizations and polling firms chose to undertake the hard work of surveying voters and releasing their results to the public, for free. There isn’t anyone who doesn’t wish the data had been more accurate. But those organizations who made the effort to contribute to public knowledge about the campaign by publishing their results deserve our gratitude and respect. Thank you. What we need in order to avoid a repeat of this surprising outcome in 2020 is not less pre-election polling, but more—by more firms, with different methodologies, and different viewpoints.

Final 2016 Presidential Forecast: Clinton 323, Trump 215

Drew — Mon, 07 Nov 2016 23:51:49 +0000

Cross-posted at Daily Kos Elections

Over the course of this presidential campaign, Daily Kos Elections has logged 1,371 state-level presidential polls into our database. All signs point to a Hillary Clinton victory.

Our forecasting model indicates that Clinton is highly likely to win key states including Colorado, Pennsylvania, New Hampshire, Virginia, and Wisconsin. In all five of these states, Clinton has never trailed in our average of the polls—and if she carries all of them, she would win the election over Donald Trump with 273 electoral votes, three more than the 270 required for victory. In addition, our model also favors Clinton in Florida, North Carolina, and Nevada. Together, those states contribute another 50 electoral votes.

That gives us our final prediction: Clinton 323 electoral votes, Trump 215.

Given that the forecast is based almost entirely on public polling data, how much can we trust the accuracy of the polls? As recently as one week ago, Clinton held such a commanding lead that our model placed her chances of winning as high as 96 percent. Since then, the race has tightened, and we currently estimate Clinton’s odds of victory at 88 percent. That’s enough of a change that a large and consistent polling error could make the difference for Trump. But the error would have to be very large, and very consistent. Going into Election Day, Clinton’s average lead in the polls is 3 points in New Hampshire, 4 points in Colorado and Pennsylvania, and 5 points in Wisconsin and Virginia.

Polling is never perfect, but systematic errors across multiple states in the same presidential election are historically not that large, or that common. Instead, the state-level errors form a distribution: In some states, one candidate outperforms the polls, and in other states, the other candidate does better. For example, in 2012, on average, the polls underestimated Obama’s vote share by a small amount; nevertheless, in 22 states, his polling was higher than his eventual vote share. Polling errors are less “correlated” across states than you might expect.

What about the magnitude of the state-level polling errors? Aggregating public polls usually produces forecasts that are very close to the actual outcome, especially in competitive states where pollsters have conducted larger numbers of polls. Again using 2012 as an example, there were 15 close states where a candidate won by 10 points or fewer (counting only the major-party vote). In seven of those states, the polls accurately predicted the margin of victory to within 1 percentage point. In another three states, the polls missed the actual margin of victory by under 2 points, and in four states, the polls were off by between 2 and 3 points. In only one state did the polls miss the margin by more than 3 points. And to reinforce our point above about correlated polling errors, Obama outperformed his polls in eight of the 15 close states; in the other seven states, Romney did better than expected.

So, while it’s possible for Trump to defy the polls and win the election, it is not likely. Our model estimates Trump’s chances at around 12 percent.

Stepping away from the polling data, there are reasons to think that the probability of a Trump victory isn’t even this high. None of these other factors are formally built into our model, and I haven’t analyzed them in any systematic or historical context, but consider everything below here informed conjecture. My Daily Kos colleague Stephen Wolf also examined some of these factors, and others, in a recent post exploring why the polls could be off.

First, our forecasting model takes the public polls essentially at face value: We apply a slight adjustment to polls conducted by partisan pollsters, and we make a few assumptions about how quickly to assimilate new polling data and how much to infer state trends from national trends. But we have no way to account for phenomena like differential partisan nonresponse, which may be responsible for the seemingly large swings in the presidential polls this year. If, contrary to some of the raw polling data, public opinion has been as stable as recent research suggests, then some of the more sophisticated online tracking surveys, like those from YouGov, NBC/SurveyMonkey, and Google—all of which have shown Clinton with a consistent lead—might have it right.

Our model also does not incorporate data on early voting, beyond what is implicitly captured by polls that include respondents who have already voted. Although there is disagreement about how much should be read into early vote totals, one state stands out: Nevada. Heavy Latino turnout in the Nevada early voting period appears to have put a significant dent in Trump’s chances of winning there—a must-win state for him where polls alone suggest he has at least a one-in-three chance of winning.

Related to this, there are a variety of reports indicating a large discrepancy in the size and quality of the Clinton and Trump campaigns’ voter turnout operations. In short, Clinton enjoys a significant advantage. Research suggests that her superior “ground game” could be worth up to 1 to 3 percent of the vote. This will not be picked up in the polls.

Finally, although the “fundamentals” of the presidential election have long been factored out of our forecast in favor of newer polling data, two key structural factors have actually gotten better for Clinton over the course of the campaign: President Obama’s job approval rating is on an upswing, and the national economy is growing at a faster rate than when we first accounted for these factors back in June.

There is one last caveat that gives me pause: The number of voters telling pollsters that they are still undecided, or are intending to vote for a third-party candidate, remains unusually high. We know that these respondents are disproportionately younger, white voters who would otherwise be likely to support Hillary Clinton. But we have no way of knowing for sure how these individuals will vote, or if they will turn out to vote at all. It’s something that I will be looking out for on Tuesday.

Happy Election Day.

Looking for Presidential Election Forecasts?

Drew — Thu, 03 Nov 2016 01:39:54 +0000

Please head over to Daily Kos Elections, where you’ll find the implementation of my forecasting model this year. My forecasts are also part of the comparison table at The Upshot at the New York Times, labeled ‘DK’.

In addition to predictions of the presidential race, we are making forecasts of all of the 2016 Senate races, and we’re calculating a complete set of poll-tracking trendlines for every state. Here’s the North Carolina presidential matchup.

For polling trends in all 50 states at a glance, you can also check out the poll tracker page at this site, or my trend detail page, which gives a zoomed-in look at how voter preferences have shifted during the campaign.

How bad is it for Donald Trump? Let’s do the math

Drew — Tue, 11 Oct 2016 19:57:29 +0000

Cross-posted at Daily Kos Elections

Even before news broke this weekend about Donald Trump’s 2005 Access Hollywood tapes, he had been receiving some extremely bleak polling numbers. As of today, Trump trails Hillary Clinton by 9 points in Virginia, by 8 points in Pennsylvania, by 6 points in Colorado, by 4 points in Florida, and by 3 points in North Carolina.

When we run all of these polls though our presidential forecasting model, it predicts that Trump has less than a 10 percent chance of winning the presidency.

Those are long odds. But they follow from the data. Here’s why our model is able to make such a strong prediction—and why we’re not the only forecasters to see the race this way.

Our model starts by forecasting the outcome of the presidential election in all 50 states and Washington, D.C., and then aggregates those results up to a national forecast. As expected, the polls show that there are a range of states that are “safe” for either Clinton or Trump—that is, where one candidate has at least a 99 percent chance of winning. But given our uncertainty about what could happen between now and Election Day, there also are states like Nevada, Ohio, and Iowa that could go either way. The full set of probabilities that Clinton or Trump will win each state are in the left sidebar on our presidential election overview page.

The next step is to convert all of these state probabilities into an overall chance that Clinton or Trump will win the election. For the sake of illustration, the simplest way to do this is to randomly simulate each state’s election outcome a large number of times and record the winner. From our current estimates, Clinton would win Nevada in 63 percent of simulations, Ohio in 46 percent of simulations, and so on. Again for ease, assume that the state outcomes are independent, so that whether Clinton wins Nevada has no bearing on whether she also wins Ohio. This isn’t completely realistic—and in fact, it’s not how our model works—but it’s a sufficient approximation. In each simulation, the candidate who wins each state receives all of that state’s electoral votes, which we add across all 50 states and D.C.

If we follow this procedure with our current set of state probabilities, Clinton comes out ahead in 99 percent of simulations. That is, in only 1 out of every 100 simulated elections does Donald Trump receive 270 or more electoral votes, and win the election. Clinton’s lead is so substantial that if we count up the electoral votes in the states she’s most likely to win, she gets to 273 by winning Colorado—an outcome that our model estimates is 94 percent likely.

On the other hand, finding a permutation of states that is consistent with the polling data and that gets Trump to 270 electoral votes is extremely difficult. In his most favorable scenario, Trump would have to win Colorado, where he only has a 6 percent chance, and Florida, where has has a 20 percent chance, and North Carolina, where he has a 35 percent chance, and Nevada, where he has a 37 percent chance, and every other state where his level of support is higher. If Trump loses any single one of these states, Clinton wins the election.

The other major forecasting models aren’t any more favorable to Trump’s chances. If we take the probabilities of winning each state currently being forecasted by The Upshot, FiveThirtyEight, The Huffington Post, PredictWise, and the Princeton Election Consortium, and run them through the same simulation, the result is nearly identical: Clinton’s implied chances of winning the national election are close to 100 percent:

FiveThirtyEight: 98 percent
The Upshot: 97 percent
The Huffington Post: 99 percent
Princeton Election Consortium: 98 percent
PredictWise: 99 percent

The distributions of simulated electoral votes for Hillary Clinton under each model—again, by simply taking the state forecasts at face value—reinforce the challenge Trump faces. In every one of the models’ electoral vote histograms, there are almost no outcomes to the left of the blue line at 269 electoral votes, which is what Trump would need to win.

These histograms—and the chances of Clinton winning—are different from what each model is actually reporting as their national-level forecast because, like us, none of the other forecasters assume that state election outcomes are independent. If the polls are wrong, or if there’s a national swing in voter preferences toward Trump, then his odds should increase in many states at once: Nevada, Ohio, Florida, and so forth.

This adds extra uncertainty to the forecast, which widens the plausible range of electoral vote outcomes, and lowers Clinton’s chances of winning. The additional assumptions of The Upshot model, for example, bring Clinton’s overall chances down to 87 percent. In the FiveThirtyEight model, Clinton’s chances drop to 84 percent; and their histogram in particular looks very different than what I plotted above. (The Upshot recently published a pair of articles that explored these modeling choices more thoroughly.)

What this demonstrates, though, is that at this point in the campaign, the disagreements between the presidential models’ forecasts are primarily due to differences in the way uncertainty is carried through from the state forecasts to the national forecast. It is not that any of the forecasting models have a fundamentally more pro-Trump interpretation of the data. The models are essentially in agreement. Donald Trump is extremely unlikely to win the presidential election.

Save

Forecasting the 2016 Elections

Drew — Tue, 09 Aug 2016 01:05:10 +0000

Welcome to Votamatic for the 2016 presidential election campaign.

For those new to the site, I originally launched Votamatic in 2012 to track and forecast the presidential election between Barack Obama and Mitt Romney, based on some academic research I had been doing at the time. My early prediction of a 332-206 Obama win, using a combination of historical data, state-level public opinion polls, and a Bayesian statistical model turned out to be exactly correct. All of the data and results from 2012 have been archived, and can be reached from the top navigation bar.

The 2016 version of Votamatic is going to be fairly scaled back compared to 2012. I’ll still have poll tracking charts and the occasional blog post, but my election forecasts will be built into a brand new site at Daily Kos Elections that I’ve been helping to create. In 2014, I worked with the Daily Kos Elections team to forecast the midterm Senate and gubernatorial elections, with continued success. This year, we’re expanding the collaboration.

Over the next few weeks, we’ll be rolling out a bunch of new features, so stay tuned. Starting with presidential forecasts, we’ll soon add forecasts of every Senate and gubernatorial race in the nation (including the chances that the Democrats will retake the Senate), and sophisticated poll tracking charts and trendlines, all built on top of a custom polling database. The site will also feature Daily Kos Elections’ regular campaign reporting and analysis, as well as candidate endorsements and opportunities for getting involved. I hope you’ll find the site interesting, immersive, and accurate — and worth returning to as the campaign evolves.

(Sneak preview: Other election forecasters are giving Hillary Clinton around an 80-85% chance of winning. My interpretation of the polling data and other historical factors makes me a little less confident in a Clinton victory, but not much so; I’ll have more to say on this soon. Either way, the election is still far from a done deal. Flip a coin twice: if you get two heads, that’s President Trump.)

I will update the trendlines on this site every day or two, as new polls come in. Every state that has at least one poll will get a trendline. To see the polling data and trends together, go to the Poll Tracker page. For a zoomed-in view of each state’s trendline, check out the State Trend Detail pages.

The statistical model that I use to produce these trendlines has a set of features that are designed to reveal, as clearly as possible, the underlying voter preferences in each state during the campaign. Looking at the poll tracker in Florida, for example, Clinton (blue) led until mid-July, when she was overtaken by Trump (red). After the Democratic National Convention, however, Clinton’s numbers rebounded to move her back into a slight lead.

States with more polls will have more accurate trendline estimates. But my model produces a complete trendline for each candidate in any state that has at least one poll. To do this, it looks for common patterns in public opinion across multiple states over time, and uses those to infer a national trend. (This works because changes in voter preferences are largely — though certainly not entirely — a response to national-level campaign effects.) The model then applies those trends back into each state, adjusting for each state’s unique polling data. States in which no polls have been conducted are displayed as empty plots, awaiting more data.

The trendlines that you will see here track Clinton and Trump in a head-to-head matchup only, excluding third-party candidates and voters who say they are undecided. This has the benefit of removing idiosyncrasies from the polling data around question wording, survey methodology, whether a pollster “pushes” respondents who are leaning towards either candidate into making a decision, and so forth. Visually, this explains why the trendlines for Clinton and Trump are mirror images of each other: the Clinton and Trump percents have been rescaled to sum to 100%. On the other hand, this sacrifices a lot of potentially interesting information about each race. The trendlines we’ll have at Daily Kos Elections will include other candidates and undecideds.

Finally, I account for two other features of each poll. Polls with larger sample sizes are given more weight in fitting the trendlines, relative to polls with smaller sample sizes. And if a poll was conducted by a partisan polling firm, the model subtracts 1.5% from the reported vote share of the candidate from the respective party. So, for example, if a Democratic pollster reports a race tied at 50%-50%, the model treats the poll as showing a three point Trump lead, 51.5%-48.5%. Those are the only adjustments I make to the raw polling data, assuming that all other survey errors will cancel each other out as noise.

See you soon, over at Daily Kos Elections!

2014 Senate and Governor Forecasts

Drew — Tue, 04 Nov 2014 22:32:39 +0000

This election season, I’ve been doing some work with the Daily Kos Elections team to track and forecast the midterm Senate and Gubernatorial elections. To see our predictions, click over to the Senate Outlook and Governors Outlook. You can also read more about our modeling approach here.

Overall, the polls aren’t looking good for Senate Democrats this year. We predict a 90% chance that Republicans will gain control of the Senate (assuming the public polls can be trusted). The most likely outcome is a 53-seat Republican majority. On the Gubernatorial side, the situation is better for the Democrats, but there are still a lot of close races — and a lot of uncertainty. It’s possible that Democrats could end up controlling anywhere from 16 to 27 states; they currently control 21.

For Election Night resources, I can recommend:

The Daily Kos Elections liveblog
The New York Times Upshot Election Tracker
The Huffington Post Election Dashboard

For more on the similarities and differences between the major midterm election forecasting models, Vox and the Washington Post both had very nice overviews of how Senate forecasts are typically made, how they should be interpreted, and how to judge their predictions after the election.

Evaluating the Forecasting Model

Drew — Fri, 16 Nov 2012 05:40:59 +0000

Since June, I’ve been updating the site with election forecasts and estimates of state-level voter preferences based on a statistical model that combines historical election data with the results of hundreds of state-level opinion polls. As described in the article that lays out my approach, the model worked very well when applied to data from the 2008 presidential election. It now appears to have replicated that success in 2012. The model accurately predicted Obama’s victory margin not only on Election Day – but months in advance of the election as well.

With the election results (mostly) tallied, it’s possible to do a detailed retrospective evaluation of the performance of my model over the course of the campaign. The aim is as much to see where the model went right as where it might have gone wrong. After all, the modeling approach is still fairly new. If some of its assumptions need to be adjusted, the time to figure that out is before the 2016 campaign begins.

To keep myself honest, I’ll follow the exact criteria for assessing the model that I laid out back in October.

Do the estimates of the state opinion trends make sense?
Yes. The estimated trendlines in state-level voter preferences appear to pass through the center of the polling data, even in states with relatively few polls. This suggests that the hierarchical design of the model, which borrows information from the polls across states, worked as intended.

The residuals of the fitted model (that is, the difference between estimates of the “true” level of support for Obama/Romney in a state and the observed poll results) are also consistent with a pattern of random sampling variation plus minor house effects. In the end, 96% of polls fell within the theoretical 95% margin of error; 93% were within the 90% MOE; and 57% were within the 50% MOE.

How close were the state-level vote forecasts to the actual outcomes, over the course of the campaign?
The forecasts were very close to the truth, even in June. I calculate the mean absolute deviation (MAD) between the state vote forecasts and the election outcomes, on each day of the campaign. In the earliest forecasts, the average error was already as low as 2.2%, and gradually declined to 1.7% by Election Day. (Perfect predictions would produce a MAD of zero.)

By incorporating state-level polls, the model was able to improve upon the baseline forecasts generated by the Abramowitz Time-for-Change model and uniform swing – but by much less than it did in 2008. The MAD of the state-level forecasts based on the Time-for-Change model alone – with no polling factored in at all – is indicated by the dashed line in the figure. It varied a bit over time, as updated Q2 GDP data became available.

Why didn’t all the subsequent polling make much difference? The first reason is that the Time-for-Change forecast was already highly accurate: it predicted that Obama would win 52.2% of the major party vote; he actually received 51.4%. The successful track record of this model is the main reason I selected it in the first place. Secondly, state-level vote swings between 2008 and 2012 were very close to uniform. This again left the forecasts with little room for further refinement.

But in addition to this, voters’ preferences for Obama or Romney were extremely stable this campaign year. From May to November, opinions in the states varied by no more than 2% to 3%, compared to swings of 5% to 10% in 2008. In fact, by Election Day, estimates of state-level voter preferences weren’t much different from where they started on May 1. My forecasting model is designed to be robust to small, short-term changes in opinion, and these shifts were simply not large enough to alter the model’s predictions about the ultimate outcome. Had the model reacted more strongly to changes in the polls – as following the first presidential debate, for example – it would have given the mistaken impression that Obama’s chances of reelection were falling, when in fact they were just as high as ever.
What proportion of state winners were correctly predicted?
As a result of the accuracy of the prior and the relative stability of voter preferences, the model correctly picked the winner of nearly every state for the entire campaign. The only mistake arose during Obama’s rise in support in September, which briefly moved North Carolina into his column. After the first presidential debate, the model returned to its previous prediction that Romney would win North Carolina. On Election Day, the model went 50-for-50.
Were the competitive states identified early and accurately?
Yes. Let’s define competitive states as those in which the winner is projected to receive under 53% of the two-party vote. On June 23, the model identified twelve such states: Arizona, Colorado, Florida, Indiana, Iowa, Michigan, Missouri, Nevada, North Carolina, Ohio, Virginia, and Wisconsin. That’s a good list.
Do 90% of the actual state vote outcomes fall within the 90% posterior credible intervals of the state vote forecasts?
This question addresses whether there was a proper amount of uncertainty in the forecasts, at various points in the campaign. As I noted before, in 2008, the forecasts demonstrated a small degree of overconfidence towards the end of the campaign. The results from the 2012 election show the same tendency. Over the summer, the forecasts were actually a bit underconfident, with 95%-100% of states’ estimated 90% posterior intervals containing the true outcome. But by late October, the model produced coverage rates of just 70% for the nominal 90% posterior intervals.

As in 2008, the culprit for this problem was the limited number of polls in non-competitive states. The forecasts were not overconfident in the key battleground states where many polls were available, as can be seen in the forecast detail. It was only in states with very few polls – and especially where those polls were systematically in error, as in Hawaii or Tennessee – that the model became misled. A simple remedy would be to conduct more polls in non-competitive states, but it’s not realistic to expect this to happen. Fortunately, overconfidence in non-competitive states does not adversely impact the overall electoral vote forecast. Nevertheless, this remains an area for future development and improvement in my model.

It’s also worth noting that early in the campaign, when the amount of uncertainty in the state-level forecasts was too high, the model was still estimating a greater than 95% chance that Obama would be reelected. In other words, aggregating a series of underconfident state-level forecasts produced a highly confident national-level forecast.

How accurate was the overall electoral vote forecast?
The final electoral vote was Obama 332, Romney 206, with Obama winning all of his 2008 states, minus Indiana and North Carolina. My model first predicted this outcome on June 23, and then remained almost completely stable through Election Day. The accuracy of my early forecast, and its steadiness despite short-term changes in public opinion, is possibly the model’s most significant accomplishment.

In contrast, the electoral vote forecasts produced by Nate Silver at FiveThirtyEight hovered around 300 through August, peaked at 320 before the first presidential debate, then cratered to 283 before finishing at 313. The electoral vote estimator of Sam Wang at the Princeton Election Consortium demonstrated even more extreme ups and downs in response to the polls.

Was there an appropriate amount of uncertainty in the electoral vote forecasts?
This is difficult to judge. On one hand, since many of the state-level forecasts were overconfident, it would be reasonable to conclude that the electoral vote forecasts were overconfident as well. On the other hand, the actual outcome – 332 electoral votes for Obama – fell within the model’s 95% posterior credible interval at every single point of the campaign.

Finally, how sensitive were the forecasts to the choice of structural prior?
Given the overall solid performance of the model – and that testing out different priors would be extremely computationally demanding – I’m going to set this question aside for now. Suffice to say, Florida, North Carolina, and Virginia were the only three states in which the forecasts were close enough to 50-50 that the prior specification would have made much difference. And even if Obama had lost Florida and Virginia, he still would have won the election. So this isn’t something that I see as an immediate concern, but I do plan on looking into it before 2016.

Final Result: Obama 332, Romney 206

Drew — Sat, 10 Nov 2012 00:13:47 +0000

The results are in: Obama wins all of his 2008 states, minus Indiana and North Carolina, for 332 electoral votes. This is exactly as I predicted on Tuesday morning – and as I’ve been predicting (albeit with greater uncertainty) since June. Not bad! The Atlantic Wire awarded me a Gold Star for being one of “The Most Correct Pundits In All the Land”. There were also nice write-ups in The Chronicle of Higher Education, BBC News Magazine, Atlanta Journal-Constitution and the LA Times, among others. Thanks to everyone who has visited the site, participated in the comments, and offered their congratulations. I really appreciate it.

I’m still planning a complete assessment of the performance of the forecasting model, along the lines I described a few weeks ago. But in the meantime, a few quick looks at how my Election Day predictions stacked up against the actual state-level vote outcomes. First, a simple scatterplot of my final predictions versus each state’s election result. Perfect predictions will fall along the 45-degree line. If a state is above the 45-degree line, then Obama performed better than expected; otherwise he fared worse.

Interestingly, in most of the battleground states, Obama did indeed outperform the polls; suggesting that a subset of the surveys in those states were tilted in Romney’s favor, just as I’d suspected. Across all 50 states, however, the polls were extremely accurate. The average difference between the actual state vote outcomes and the final predictions of my model was a miniscule 0.03% towards Obama.

My final estimates predicted 19 states within 1% of the truth, with a mean absolute deviation of 1.7%, and a state-level RMSE of 2.3% (these may change slightly as more votes are counted). Other analysts at the CFAR blog and Margin of Error compared my estimates to those of Nate Silver, Sam Wang, Simon Jackman, and Josh Putnam, and found they did very well. All in all, a nice round of success for us “quants”.

Unsurprisingly, my model made much better predictions where more polls had been fielded! Here I’ll plot the difference between Obama’s share of the two-party vote in each state, and my final prediction, as a function of the number of polls in the state since May 1. Again, positive values indicate states where Obama did better than expected.

For minimizing the error in my forecasts, the magic number of polls per state appears to be around 25. That’s really not a lot; and I’m hopeful that we can get to at least this number in 2016. It’s a bit concerning, though, that there were about 25% fewer state-level presidential polls this year, compared to 2008.

Recently there have been some complaints among pollsters – most notably Gallup’s Frank Newport – that survey aggregators (like me) “don’t exist without people who are out there actually doing polls,” and that our work threatens to dissuade survey organizations from gathering these data in the first place. My view is slightly different. I’d say that working together, we’ve proven once again that public opinion research is a valuable source of information for understanding campaign dynamics and predicting election outcomes. There’s no reason why the relationship shouldn’t be one of mutual benefit, rather than competition or rivalry. In a similar manner, our analyses supplement – not replace – more traditional forms of campaign reporting. We should all be seen as moving political expertise forward, in an empirical and evidence-based way.

Election Day Forecast: Obama 332, Romney 206

Drew — Tue, 06 Nov 2012 17:53:55 +0000

With the last set of polls factored into the model, my final prediction is Obama to win 332 electoral votes, with 206 for Romney. This is both the median and the modal outcome in my electoral vote simulation, and corresponds to Obama winning all of his 2008 states except Indiana and North Carolina.

The four closest states – and therefore the most difficult to predict – are Florida, North Carolina, Virginia, and Colorado. Of these, my model only expects Romney to win North Carolina; but Florida is a true toss-up, with just a 60% chance of Obama victory. I would not be surprised if Florida ended up going for Romney. If that happens, Obama would win 303 electoral votes, which is the second-most likely scenario in my simulation. The third-most likely scenario is that Obama wins 347 electoral votes, picking up North Carolina in addition to Florida.

It’s been interesting to watch the forecasts of other poll watchers converge on the 332 estimate. Sam Wang, at the Princeton Election Consortium, also sees 332 as the modal outcome. So does Simon Jackman at the Huffington Post, and Josh Putnam at FHQ. Nate Silver, at his FiveThirtyEight blog, reports the mean of his electoral vote simulation at 313 – effectively splitting the difference on Florida, which he currently rates a 50.3% chance of an Obama win. But his most likely outcome is still Obama 332, followed by 303 and 347, just like me. Update: both Wang and Jackman revised their forecasts slightly downward this afternoon, based on late-arriving polls.

There will be plenty of opportunities to evaluate all of these forecasts once the election results are known. I’ve already laid out the standards I’ll be using to check my own model. This is how quantitative election forecasting can make progress, and hopefully work even better next time.

I’ll add, though, that on the eve of the election, averaging the polls, or running them through any sort of sensible model, isn’t all that hard. We are all using the same data (more or less) and so it doesn’t surprise me that we’re all reaching similar conclusions. The real challenge is producing meaningful and accurate forecasts early in the campaign. My model is designed to be robust to short-term fluctuations in the polls, and converge in a stable and gradual manner to the final, Election Day estimates. It appears that in this regard, the model has worked as intended.

But from a broader perspective, my model has been predicting that Obama will win 332 electoral votes – give or take – since June. If all of us are correct today, the next question to ask is when each model arrived at the ultimate outcome. That’s a big if, though. Let’s start with how the votes come in tonight, and go from there.