• About Drew

    Drew Linzer is formerly Assistant Professor of Political Science at Emory University, and currently resides in Berkeley, CA. He received a Ph.D. in Political Science from UCLA in 2008. Between 1998 and 2005, Dr. Linzer worked for polling firms in Washington, DC, Santa Monica, CA, and Palo Alto, CA.

    Model Checking

    by  • October 10, 2012 • Uncategorized • 5 Comments

    One of the fun things about election forecasting from a scientific standpoint is that on November 6, the right answer will be revealed, and we can compare the predictions of our models to the actual result. It’s not realistic to expect any model to get exactly the right answer – the world is just too noisy, and the data are too sparse and (sadly) too low quality. But we can still assess whether the errors in the model estimates were small enough to warrant confidence in that model, and make its application useful and worthwhile.

    With that said, here are the criteria I’m going to be using to evaluate the performance of my model on Election Day.

    1. Do the estimates of state opinion trends make sense? Although we won’t ever know exactly what people were thinking during the campaign, the state trendlines should at least pass through the center of the data. This validation also includes checking that the residual variance in the polls matches theoretical expectations, which so far, it has.
    2. How large is the average difference between the state vote forecasts and the actual outcomes? And did this error decline in a gradual manner over the course of the campaign? In 2008, the average error fell from about 3% in July to 1.4% on Election Day. Anything in this neighborhood would be acceptable.
    3. What proportion of state winners were correctly predicted? Since what ultimately matters is which candidate receives a plurality in each state, we’d like this to be correct, even if the vote share forecast is a bit off. Obviously, states right at the margin (for example, North Carolina, Florida, and Colorado) are going to be harder to get right.
    4. Related to this, were the competitive states identified early and accurately? One of the aims of the model is to help us distinguish safe from swing states, to alert us where we should be directing most of our attention.
    5. Do 90% of state vote outcomes fall within the 90% posterior credible intervals of the state forecasts? This gets at the uncertainty in the model estimates. I use a 90% interval so that there’s room to detect underconfidence as well as overconfidence in the forecasts. In 2008, the model was a bit overconfident. For this year, I’ll be fine with 80% coverage; if it’s much lower than that, I’ll want to revisit some of the model assumptions.
    6. How accurate was the overall electoral vote forecast? And how quickly (if at all) did it narrow in on the actual result? Even if the state-level estimates are good, there might be an error in how the model aggregates those forecasts nationally.
    7. Was there an appropriate amount of uncertainty in the electoral vote forecasts? Since there is only one electoral vote outcome, this will involve calculating the percentage of the campaign during which the final electoral vote was within the model’s 95% posterior credible interval. Accepting the possibility of overconfidence in the state forecasts, this should not fall below 85%-90%.
    8. Finally, how sensitive were the forecasts to the choice of structural prior? Especially if the model is judged to have performed poorly, could a different prior specification have made the difference?

    If you can think of any I’ve left off, please feel free to add them in the comments.

    Aftermath of the First Debate

    by  • October 8, 2012 • Uncategorized • 3 Comments

    Polls released since the first presidential debate last week indicate as rapid a shift in voter preferences as we’ve seen all campaign. My model estimates a swing of about 1.5% in Romney’s direction, or a net narrowing of about 3%. Although the polls also suggest that this movement began a few days before the debate, it’s still a large effect.

    What to make of it? First, and most importantly, although Romney may have cut into Obama’s lead, Obama is still comfortably ahead. The most important state to win this year is arguably Ohio – and there Obama holds on to 51.7% of the major-party vote. According to my model, Obama had been outperforming the fundamentals (which point to his reelection) prior to the debate – and now he’s running just slightly behind them. As a result, the model’s long-term forecast continues to show an Obama victory, with 332 electoral votes.

    Second, there’s reason to believe that the initial estimates of Romney’s post-debate surge are going to weaken as more polls are released today and tomorrow. The surveys that made it into my Sunday morning update consisted of a number of one-day samples, which tend to draw in particularly enthusiastic respondents – in this case, Republicans excited about Romney’s debate performance. Moreover, the survey methodology used by these firms – in which interviews are conducted using recorded scripts, to save time and money – also show a Republican lean. And if anything, my model gives slightly greater weight to automated polls simply because they tend to have larger sample sizes.

    The point isn’t that these polls are “wrong” – only that this is a situation where it would be wise to wait for more information before reaching any strong conclusions. My model treats every poll equally, regardless of how it was fielded, or by whom. The reason I don’t try to “correct” for potential errors in the polls isn’t because I don’t believe they exist – but because I don’t believe those adjustments can be estimated reliably enough to make much of a difference. (Consider how wide the error bars are on Simon Jackman’s house effects estimates, for example.) Instead, I assume there will eventually be enough polling in all 50 states for these errors to cancel out. Usually that is a pretty safe assumption, but I don’t think it’s happened yet.

    I’m going to embed the current trend estimates for Virginia and Florida here in this post, so we can compare them to later estimates, and see if I’m right.

    Finally, making sense of this small batch of post-debate polls highlights the value of using an informative Bayesian prior. If Romney is really experiencing a sudden swing in the polls, then we already have some idea of how quickly that could reasonably happen, based on previous trends. It’s certainly possible that something about public opinion has fundamentally changed within the past week. But if that’s the case, we should require extra evidence to overturn what we previously thought was going on.

    Look for another site update Tuesday morning.

    Where Things Stand

    by  • October 2, 2012 • Uncategorized • 5 Comments

    If anyone tries to tell you the presidential race is close, don’t believe it. It’s just not true. With the debates beginning tomorrow, Obama’s September surge in the polls appears to have finally leveled off – but it has moved him into the lead in every single battleground state, including North Carolina.

    If the election were held today, my model predicts Obama would get 52% of the major-party vote in Florida and 53% in Ohio. If Obama wins Florida, there’s almost no chance Romney can win the election. If Obama loses Florida but wins Ohio, Romney’s chances are only slightly higher.

    Romney has to be hoping for a very large, and very consistent swing in opinion across a large number of states. The shift will have to be over 2% – which would be as big a change in voter preferences as we’ve seen during the entire campaign. And it will have to begin immediately. Post-RNC, it took just under one month for Obama to gain 1.5%-2% in the polls. Romney has just over one month to undo that trend, and more.

    Looking for House Effects

    by  • September 27, 2012 • Uncategorized • 5 Comments

    There’s been a lot of talk lately about how the presidential polls might be biased. So let’s look at how well – or poorly – some of the major survey firms are actually performing this year.

    All polls contain error, mainly from the limitations of random sampling. But there are lots of other ways that error can creep into surveys. Pollsters who truly care about getting the right answer go through great pains to minimize these non-sampling errors, but sometimes systematic biases – or house effects – can remain. For whatever reason, some pollsters are consistently too favorable (or not favorable enough) to certain parties or candidates.

    Since May 1, there have been over 400 state polls, conducted by more than 100 different survey organizations. However, a much smaller number of firms have been responsible for a majority of the polls: Rasmussen, PPP, YouGov, Quinnipiac, Purple Strategies, We Ask America, SurveyUSA, and Marist.

    For each poll released by these firms, I’ll calculate the survey error as the difference between their reported level of support for Obama over Romney, and my model’s estimate of the “true” level of support on the day and state of the poll. Then each firm’s house effect is simply the average of these differences. (Note that my model doesn’t preemptively try to adjust for house effects in any way.) If a firm is unbiased, its average error should be zero. Positive house effects are more pro-Obama; negative house effects are more pro-Romney. Here’s what we find.

    Survey Firm # Polls House Effect
    PPP 61 +0.7%
    Marist 15 +0.5%
    SurveyUSA 22 +0.3%
    Quinnipiac 35 +0.1%
    YouGov 27 0%
    We Ask America 17 -0.2%
    Purple Strategies 18 -0.9%
    Rasmussen 53 -0.9%

    There are a number of pieces of information to take away from this table. First, none of the house effects are all that big. Average deviations are less than 1% in either direction. This is much smaller than the error we observe in the polls due to random sampling alone.

    Second, even if, say, Rasmussen is getting the right numbers on average – so that PPP’s house effect is actually +1.6% – then that +1.6% bias still isn’t that big. It’s certainly not enough to explain Obama’s large – and increasing – lead in the polls. Of course, it’s possible that even Rasmussen is biased pro-Obama, and we just aren’t able to tell. But I don’t believe anyone is suggesting that.

    Finally, the firms with the largest house effects in both directions – PPP and Rasmussen – are also the ones doing the most polls, so their effects cancel out. Just another reason to feel comfortable trusting the polling averages.

    Here’s a plot highlighting each of the eight firms’ survey errors versus sample sizes. The horizontal lines denote the house effects. Dashed lines indicate theoretical 95% margins of error, assuming perfect random sampling. Again, nothing very extraordinary. We would expect PPP and Rasmussen to “miss” once or twice, simply because of how many polls they’re fielding.

    Just out of curiosity (and no particular feelings of cruelty, I swear), which polls have been the most misleading – or let’s say, unluckiest – of the campaign so far? Rather than look at the raw survey error, which is expected to be larger in small samples, I’ll calculate the p-value for each poll, assuming my model represents the truth. This tells us the probability of getting a survey result with the observed level of error (or greater), at a given sample size, due to chance alone. Smaller p-values reveal more anomalous polls.

    Here are all surveys with a p-value less than 0.01 – meaning we’d expect to see these results in fewer than 1 out of every 100 surveys, if the polling firm is conducting the survey in a proper and unbiased manner.

    p-value Error Survey Firm Date State Obama Romney Sample Size
    0.001 0.07 Suffolk 9/16/2012 MA 64% 31% 600
    0.002 -0.07 InsiderAdvantage 9/18/2012 GA 35% 56% 483
    0.003 -0.03 Gravis Marketing 9/9/2012 VA 44% 49% 2238
    0.003 -0.05 Wenzel Strategies (R) 9/11/2012 MO 38% 57% 850
    0.003 0.05 Rutgers-Eagleton 6/4/2012 NJ 56% 33% 1065
    0.004 -0.04 FMWB (D) 8/16/2012 MI 44% 48% 1733
    0.005 -0.05 Gravis Marketing 8/23/2012 MO 36% 53% 1057
    0.006 -0.03 Quinnipiac 5/21/2012 FL 41% 47% 1722
    0.009 -0.04 Quinnipiac/NYT/CBS 8/6/2012 CO 45% 50% 1463

    The single most… unusual survey was the 9/16 Suffolk poll in Massachusetts that overestimated Obama’s level of support by 7%. However, of the nine polls on the list, seven erred in the direction of Romney – not Obama. And what to say about Gravis Marketing, who appears twice – strongly favoring Romney – despite only conducting 10 polls. Hm.

    It’s interesting that many of these surveys had relatively large sample sizes. The result is that errors of only 3%-4% appear more suspicious than if the sample had been smaller. It’s sort of a double whammy: firms pay to conduct more interviews, but all they accomplish by reducing their sampling error is to provide sharper focus on the magnitude of their non-sampling error. They’d be better off sticking to samples of 500, where systematic errors wouldn’t be as apparent.

    Projecting Ahead to Election Day

    by  • September 25, 2012 • Uncategorized • 4 Comments

    There’s a general consensus among poll-watchers that Obama is currently ahead in most – if not all – of the battleground states. How likely is this lead to hold up through Election Day? And what range of outcomes are realistic to expect? Let’s set aside the forecasts being produced by my model (which combine the polls with certain assumptions about long-term election fundamentals), and instead just walk through a few different scenarios starting from where preferences stand today.

    I start by accounting for uncertainty in the current level of support for Obama and Romney in each state. The idea is simply that we have better estimates of public opinion in states that have been polled more frequently. From the model results, I simulate 10,000 plausible “states of the race” for all 50 states.

    Next, we have to make some guesses about how voter preferences might change between now and Election Day. So far, state-level opinion has been fairly stable; only varying within a 1%-2.5% range. Since Obama is ahead right now, the less we believe preferences are going to fluctuate over the next six weeks, the worse for Romney. So let’s generously assume that with 95% probability, voters might swing as much as 4% in either direction from their current spot, with a modal change of zero.

    Here’s what the combination of these two assumptions would look like in Florida. (Recall all percentages exclude third-party candidates and undecideds.) There’s some initial variation around today’s estimate (the square); then the potential future changes are added in. The result is a 77% chance of Obama winning Florida – that is, 77% of the 10,000 simulations result in an Obama two-party vote share above 50%.

    Finally, to extend the simulation to all 50 states, we have to consider that future changes in state opinion are not likely to be independent. In other words, if Romney starts doing better in Florida, he’s probably going to improve in North Carolina, Virginia, Ohio, etc. as well. So we want to build in some correlation between the state trends. Perfect correlation would be equivalent to “uniform swing” in which a constant amount is added to (or subtracted from) each state’s current estimate. The lower the correlation, the more the future state trends differ from one another.

    Let’s try a moderate level of inter-state correlation: 0.8 on the range from 0 to 1. I generate 10,000 hypothetical election outcomes in all 50 states, and add up the number of electoral votes Obama receives in each. The result is a 92% chance of victory for Obama, with a median outcome of 347 electoral votes. This would be Obama winning all of his 2008 states, except for Indiana.

    If we increase the correlation between states all the way to 1, Romney’s chances of winning are still just 10%. What’s going on? Obama’s lead in the polls is so large right now, that he could lose 2.5% of the vote in every single state and still have enough support to clear 270 electoral votes. The chances of him slipping more than that, if current trends continue, are slim.

    One possibility is that the polls are all biased in Obama’s favor, and have been systematically overstating his level of support. Suppose we knock 1% off the model’s current estimates in each state and re-run the simulation, assuming perfect uniform swing. In that case, Romney’s chances improve to 20%.

    If the polls are all biased 2% in Obama’s favor, the simulation moves Romney up to a 37% chance of winning – still not great, but at least better than 8%.

    No wonder the Republicans are starting to challenge the polls. Unfortunately, there’s no serious indication that the polls are behaving strangely this year.