Polls, Predictions and Why You Shouldn't Believe Them
How accurate are polls at predicting a winner? Not too. So long as a candidate is within 10 points, most polls shouldn't be readily relied on as predictors for who will win. Charles Franklin, a political science professor at the University of Wisconsin has an interesting post today about just how important the "margin of error" really is.
On a graph, Franklin compares poll results with actual election results, resulting in several observations, one of which is the importance of realizing that polls cannot reliably predict races that are less than 10 points apart.
One interesting feature is that a margin of zero (a tied poll) produces a 50-50 split in wins with remarkable accuracy. There is nothing I did statistically to force the black trend line to go through the "crosshairs" at the (0, .5) point in the graph, but it comes awfully close. So a tied poll really does predict a coin-flip outcome.
The probability of a win rises or falls rapidly as the polls move away from a margin of zero. By the time we see a 10 point lead in the poll for the Dem, about 90% of the Dems win. When we see a 10 point margin for the Rep, about 90% of Reps win. That symmetry is also not something I forced with the statistics-- it represents the simple and symmetric pattern in the data.
More practically, it means that polls rarely miss the winner with a 10 point lead, but they DO miss it 10% of the time.
A 5 point lead, on the other hand, turns out to be right only about 60-65% of the time. So bet on a candidate with a 5 point lead, but don't give odds. And for 1 or 2 point leads (as in some of our closer races tomorrow) the polls are only barely better than 50% right in picking the winner. That should be a sobering thought to those enthused by a narrow lead in the polls. Quite a few of those "leaders" will lose. Of course, an equal proportion of those trailing in the polls will win.
So read the polls-- they are a lot better than nothing. But don't take that 2 point lead to the bank. That is a failure to appreciate the practical consequences of the margin for error.
The parties themselves are also the biggest indicator of which seats are competitive. If you look at it as detailed as Jay Cost does here, a picture emerges, not of a Democratic sweep but of uncertainty:
Absent reliable polling in each district, I would say that these 35 seats, plus the 4 seats that it fails to capture, are the real battlefield. That would mean that 37 Republican seats and 4 Democratic seats are, in one way or another, up for grabs.
This might seem like a lot for the Republicans to defend, and from a certain perspective it is. However, whether or not the Democrats pick up control of the House by plucking a net of 15 of these districts really depends upon the probability of flipping we assign to each race. If, for instance, both parties have an equal shot in every seat, we should expect the Democrats to net 16 to 17 seats - and the Democrats have a 73% chance of taking the House. If the Democrats have a 40% chance in every seat, we should expect them to net 12 to 13 seats - and they have a 25% chance of taking the House. If the Democrats have a 60% chance in every seat, we should expect them to net 20 to 21 seats - and they have a 97% chance of taking the House.
This is the main reason I am skeptical of the "wave," i.e. a net of 25 or more for the Democrats. Even if we give the Democrats 2/1 odds in each contest, with this battlefield there is still only a 35% chance that they net 25 or more seats.
What are the true probabilities for these races? I honestly have not the foggiest idea beyond some basic intuitions (e.g. the GOP has a less than 50% chance in at least 6 to 7 seats). Of course, lots and lots (and lots and lots) of people are ready and willing to assign very specific probabilities to these races. But are they able? What are the data points we should use in such an endeavor?
Should we use House polls from companies we have never heard of, who are obviously pushing polls to drum up business for themselves after the election, who use samples that have strange origins, who use methods that are unpublished and probably underdetermined, who publish results that contradict other polls?
Should we use the rumors and innuendos we happen to stumble upon, the inside gossip to which almost all of us are not privy, and to which - if we are privy - we hear third, fourth, or fifth hand?
Should we use a favored set of anecdotes, interesting stories told by local news outlets on a given horse race that capture our attention, even if its actual effect on the race is undeterminable?
Assuming we can use any of this data - a condition that I think remains unfulfilled - another question presents itself: how do we use this data to assign odds? What weight should we give each data point? I honestly have no clue. It seems to me to be quite easy, in almost all districts, to write a storyline, a believable storyline, that favors one side over the other - and from that assign a probability. But, at the end of the day, what is the data that is inducing this assignment? It seems to me that it is is just a set of questionable polls, unfounded rumors and potentially irrelevant anecdotes to which we have, without any real justification, assigned determinative weight.
Ultimately, only one poll counts in the end. So don't get swayed into not doing so.