• Pollsters May Be Herding

    by  • November 5, 2012 • Uncategorized • 26 Comments

    The accuracy of my election forecasts depends on the accuracy of the presidential polls. As such, a major concern heading into Election Day is the possibility that polling firms, out of fear of being wrong, are looking at the results of other published surveys and weighting or adjusting their own results to match. If pollsters are engaging in this sort of herding behavior – and, as a consequence, converging on the wrong estimates of public opinion – then there is danger of the polls becoming collectively biased.

    To see whether this is happening, I’ll plot the absolute value of the state polls’ error, over time. (The error is the difference between a poll’s reported proportion supporting Obama, and my model’s estimate of the “true” population proportion.) Herding would be indicated by a decline in the average survey error towards zero – representing no difference from the consensus mean – over the course of the campaign. This is exactly what we find. Although there has always been a large amount of variation in the polls, the underlying trend – as shown by the lowess smoother line, in blue – reveals that the average error in the polls started at 1.5% in early May, but is now down to 0.9%.

    How worried do we need to be? Herding around the wrong value is potentially much worse than any one or two firms having an unusual house effect. But even if the variance of the polls is decreasing, they might still have the right average. An alternative explanation for this pattern could be an increase in sample sizes (resulting in lower sampling variability), but this hasn’t been the case. Unfortunately, there weren’t enough polls to tell whether the pattern was stronger in more frequently-polled states, or if particular firms were more prone to follow the pack. Hopefully, this minor trend won’t mean anything, and the estimates will be fine. We’ll know soon.

    26 Responses to Pollsters May Be Herding

    1. wheeler's cat
      November 5, 2012 at 4:47 am

      Is herding some sort of political science term? Because it implies a herding agent.
      Is there a herding agent?
      Flocking behavior is what we use for terminology in cognitive anthropology.

    2. David
      November 5, 2012 at 5:22 am

      @wheeler: the data is the flock, the pollsters are the shepherds.

    3. November 5, 2012 at 5:49 am

      Might it also reflect an electorate that is less unsettled about which candidate it prefers than was the case earlier? More locked-in minds would seem to indicate less day-to-day variability in poll results.

    4. November 5, 2012 at 6:15 am

      It could be happening without any explicit weighting or adjustment, if some of these firms are withholding results that just seem absurd. It’s what is sometimes called the “file-drawer effect” in scientific publishing: any social force that makes people less willing to actually publish a result (which could be incredulity at results that are seemingly out of line, or the lower prestige of negative results) will bias the aggregate of published results.

    5. Lois Murray
      November 5, 2012 at 6:16 am

      Follow up question: Is it easier to herd pollsters or cats?

    6. Justafed
      November 5, 2012 at 6:16 am

      Two explanations for herding I do not see explicitly mentioned above.

      1) Wouldn’t survey error decline as the number of polls increase if the “new” polls were from smaller firms that had smaller net house effects (as per your previous post)? By eyeball (a potentially bad metric, to be sure), the drift seems to accelerate as more polls are done, with the obvious confound that more polls are done as we get closer to election day.

      2) Nate Silver just pointed out a few minutes ago that we might be seeing some amount of change (which could be similar to convergence in your sense) if likely voters models are sensitive to the fact that some “unlikely voters” already early voted, and thus should have become likely (nay, certain) voters. In other words, differences between polls in their likely voter models should diminish as we become more certain that some people have voted because they have now already done so. I am not sure how large this effect would be, but it should be more obvious in the last couple of weeks as the availability of early voting has increased. And, again, my lying eyes could be lying, but the convergence appears to have strengthened over the past two weeks.

    7. Omi
      November 5, 2012 at 6:37 am

      I think there is another, much simpler explanation – the polls are converging, not herding. I suspect the reason is that the earlier a poll is conducted, the larger the fraction of people polled who just don’t care that much about the election. These people are more likely to make an unconsidered, off-the-cuff answer. Then as the election approaches, some of these people change to a considered position – which in aggregate over a population is less random than earlier unconsidered answers. In particular I bet some people don’t even think about the election until the week of.

      Testing this hypothesis is simple – if the polls are herding, they will converge on “random” results, meaning that many of these averages will have errors, in both directions. Whereas if I am correct, in most or all states the errors will be quite low after this convergence. Calculating the probably thresholds for these two possibilities is something you are better equipped to do than I.

    8. Nik
      November 5, 2012 at 6:46 am

      Could be expected to occur as undecided voters break to either candidate, increasing the counts for the candidates, thereby reducing error?

    9. Robin Colgrove
      November 5, 2012 at 6:48 am

      An interesting question would be that if the pollsters _are_ herding/flocking, whom are they converging toward. If they are converging toward the averages of the most prominent poll aggregators, it creates a strange sort of feedback loop.

    10. November 5, 2012 at 7:03 am

      The statistical analysis conducted would only be valid if the polls were sampling the same populations. Because the polls are conducted at different times the populations sampled are not the same — people change their minds. As the election approaches fewer people change their minds and the polls are conducted more often, so the populations sampled are more nearly the same. This would very likely explain the entire effect noted.

    11. Mike Collinge
      November 5, 2012 at 7:05 am

      Your worries are probably overstated. I can think of two causes for the “herding” pattern that, if true, suggest that the polls have actually become more accurate over time:

      (1) As the number of undecided voters has declined, the variance attributable to pollsters’ methods of including “leaners” has declined.

      (2) As the election approaches, pollsters with a partisan agenda who previously distorted their results have stopped distorting their results to preserve their credibility.

    12. Majorajam
      November 5, 2012 at 7:09 am

      Anyone else notice how sane, e.g. Rasmussen polls (at both national and state level) have looked recently? Or how that’s been the pattern of recent elections? That would show up statistically as herding.

      With all these crank partisan firms popping up with polls that show counterintuitive results, with the trend of right wing pollsters toward opaque methodological choices and a general lack of transparency with how central they have become to driving the tone of the media narrative, (and how critical they can be in obfuscafing voter suppression) am I the only one who is unsurprised by the above?

    13. Tom Miller
      November 5, 2012 at 7:15 am

      Yeah, convergence can happen without manipulation if some of the models give stronger weighting to “I have already voted” as people vote early, or “I intend to vote and know the voting location” when it gets closer to election day.

    14. Jon Lennox
      November 5, 2012 at 7:16 am

      These are likely-voters polls, right? I imagine accurate likely-voter screens get easier the closer you get to the election. Is the same effect happening for registered-voters polls?

    15. Hedgehog
      November 5, 2012 at 7:49 am

      I wondered about this when Rasmussen moved the race to a tie from a Romney advantage but now they have it back to Romney very slightly. Gallup, the most favorable Romney pollster has been offline for a week so we don’t know if they’d have moved back toward Obama.

    16. Jeff
      November 5, 2012 at 7:51 am

      Is there a way to look back at previous elections and see if a similar herding effect took place?

    17. Allan Marlow
      November 5, 2012 at 7:52 am

      @Drew

      I suppose you looked into this, but I wonder if you have the figures available.

      The estimated sample variance is proportional to the population variance, and if the population variance goes down, so will the sample variance, even if the sample size stays the same.

      Could it be that the population variance is going down because the size and composition of the undecideds is stabilizing? No need to heard if this is the case.

      Just wondering.

      Thanks again for your terrific contribution!

    18. Commentor
      November 5, 2012 at 8:26 am

      Drew,

      Is it possible that this effect can be explained by changes in the likely voter results caused by people reporting that they have already voted?

      In other words, if a pollster’s likely voter screen contained a house effect, that might be nullified by increasing numbers of persons previously omitted reporting as having already voted.

    19. David
      November 5, 2012 at 10:07 am

      Drew, seeing that the variation is quite large, it might be good to also look at the median error instead of the mean error (I’m presuming you did the standard thing here), but also to use the consensus median as your baseline and then plot the median error. I’m curious as to whether there is any significant difference.

      And the points others made about non-constant population variation and change in the sampled population (less undecideds) are well taken too. Fascinating stuff! What a fascinating experiment in statistical/probabilistic modeling!

    20. wheeler's cat
      November 5, 2012 at 10:55 am

      david,
      On consideration i think perhaps conservatives do have herd behavior.
      They are often deliberately herded to act against their own economic interests, so there are herding agents involved.
      Perhaps liberals are more like pack-behavior. we have quite a lot of autonomy for self criticism it seems.
      There is a biological basis for all behavior, you know.
      :)

    21. Meg
      November 5, 2012 at 11:07 am

      It looks like the sample size of polls is also getting larger over time; wouldn’t that lead to the same observed dynamic? You could bootstrap a subset of polls to overcome that issue.

      If anything, outliers appear to have increased over time.

    22. Ralph
      November 5, 2012 at 11:42 am

      It would be interesting to see the trend of registered voters vs likely voters. If the registered voters are well behaved then the herd of pollsters are herding

    23. MarkS
      November 5, 2012 at 2:19 pm

      Off topic: the site improvements I would most like to are (1) disabling of HTML coding in “your name” (to eliminate the big red names), and (2) automatic removal of any comment containing any variant of “PhD”.

    24. mut
      November 5, 2012 at 2:54 pm

      lolwut?

      This is a standard problem. Divide the data into slices of time, narrow enough that the true value is constant within each slice. For each poll in that slice, compute the pull (residual / uncertainty). The pulls should follow a distribution which is Gaussian with mean 0 and an RMS which is calculable and approaches 1 in the limit of large statistics. Without that test, there is no evidence for fudging.

    25. Mike
      November 5, 2012 at 3:17 pm

      Um, stupid question, is the herding toward Romney or Obama? In other words, is the herd trying to get with the Wang/Linzer/Silver program (to name just three)?

    26. Ryan
      November 5, 2012 at 3:32 pm

      Another possibility:

      Many polling increase their sample size as election day inches closer — Rasmussen is one such poll.

      Would this also not cause reduction in error? Or would increased sampling not reduce error to the level we’ve seen above?

    Leave a Reply

    Your email address will not be published. Required fields are marked *