With the pace of polling increasing, there are going to be days when some polls seem to be especially surprising – or even contradictory. For example, a recent Washington Post survey found Obama up 8 points in Virginia, even though other polls indicate a tighter race. It’s pretty safe to say that Obama is not actually winning Virginia by 8 points. But this doesn’t mean the Post poll is biased, or wrong, or should be ignored. I imagine the Post did the best job they could. The likeliest explanation for the finding is simply random sampling error.
Even in a perfectly executed survey, there’s going to be error due to random sampling. A survey only contacts a small group of respondents, and those people won’t always be representative of the broader population. The smaller the sample, the larger the sample-to-sample variability. To see just how large sampling error can be, suppose my model is correct that Obama is currently preferred by 52% of decided, major-party voters in Virginia. Then in different surveys of 750 respondents (which is about the average size of the state polls), it wouldn’t be unusual to see results ranging anywhere from 48% to 56%, because of sampling variation alone. In fact, here’s the expected distribution of all poll results under this scenario: most should be right around 52%, but many won’t.
If we added in other possible sources of survey error (question wording, interviewer effects, mode effects, sample selection, and so forth), the distribution would become even wider. So just imagine two polls on the same day showing Romney with either 52% or 60% of the two-party vote. Astounding, right? No, not really. It happened in Missouri last week.
What is actually astounding about the polls this year is how well they are behaving, compared to theoretical expectations. For a given sample size, the margin of error tells us how many polls should fall within a certain range of the true population value. I’ll assume my model is correctly estimating the current level of preference for Obama over Romney in each state during the campaign. Then I can subtract from each observed poll result the model estimate on that day. This is the survey error. It turns out that most polls have been exactly where they should be – within two or three points of the model estimates. And that’s without any correction in my model for “house effects,” or systematic biases in the results of particular polling organizations.
Plotting each poll’s error versus its sample size (excluding undecideds) produces the following graph. The dashed lines correspond to a theoretical 95% margin of error at each sample size, assuming that error arises only from random sampling.
If the model is fitting properly, and if there are no other sources of error in the polls, then 95% of polls should fall within the dashed lines. The observed proportion is 94%. Certainly some polls are especially misleading – the worst outlier, in the lower right corner, is the large 9/9 Gravis Marketing poll that had Romney ahead in Virginia (and was singly responsible for the brief downward blip in the Virginia forecast last week). But what is most important – and what helps us trust the pollsters as well as their polls – is that the overall distribution of survey errors is very close to what we would expect if pollsters were conducting their surveys in a careful and consistent way.