Polls, Predictions and Why You Shouldn't Believe Them

How accurate are polls at predicting a winner? Not too. So long as a candidate is within 10 points, most polls shouldn't be readily relied on as predictors for who will win. Charles Franklin, a political science professor at the University of Wisconsin has an interesting post today about just how important the "margin of error" really is.

On a graph, Franklin compares poll results with actual election results, resulting in several observations, one of which is the importance of realizing that polls cannot reliably predict races that are less than 10 points apart.

One interesting feature is that a margin of zero (a tied poll) produces
a 50-50 split in wins with remarkable accuracy. There is nothing I did
statistically to force the black trend line to go through the
"crosshairs" at the (0, .5) point in the graph, but it comes awfully
close. So a tied poll really does predict a coin-flip outcome.

The
probability of a win rises or falls rapidly as the polls move away from
a margin of zero. By the time we see a 10 point lead in the poll for
the Dem, about 90% of the Dems win. When we see a 10 point margin for
the Rep, about 90% of Reps win. That symmetry is also not something I
forced with the statistics-- it represents the simple and symmetric
pattern in the data.

More practically, it means that polls rarely miss the winner with a 10 point lead, but they DO miss it 10% of the time.

A
5 point lead, on the other hand, turns out to be right only about
60-65% of the time. So bet on a candidate with a 5 point lead, but
don't give odds. And for 1 or 2 point leads (as in some of our closer
races tomorrow) the polls are only barely better than 50% right in
picking the winner. That should be a sobering thought to those enthused
by a narrow lead in the polls. Quite a few of those "leaders" will
lose. Of course, an equal proportion of those trailing in the polls
will win.

So read the polls-- they are a lot better than
nothing. But don't take that 2 point lead to the bank. That is a
failure to appreciate the practical consequences of the margin for
error.

The parties themselves are also the biggest indicator of which seats are competitive. If you look at it as detailed as Jay Cost does here, a picture emerges, not of a Democratic sweep but of uncertainty:

Absent reliable polling in each district, I would say that these 35
seats, plus the 4 seats that it fails to capture, are the real
battlefield. That would mean that 37 Republican seats and 4 Democratic
seats are, in one way or another, up for grabs.

This might seem like a lot for the Republicans to defend, and from a
certain perspective it is. However, whether or not the Democrats pick
up control of the House by plucking a net of 15 of these districts
really depends upon the probability of flipping we assign to each race.
If, for instance, both parties have an equal shot in every seat, we
should expect the Democrats to net 16 to 17 seats - and the Democrats
have a 73% chance of taking the House. If the Democrats have a 40%
chance in every seat, we should expect them to net 12 to 13 seats - and
they have a 25% chance of taking the House. If the Democrats have a 60%
chance in every seat, we should expect them to net 20 to 21 seats - and
they have a 97% chance of taking the House.

This is the main reason I am skeptical of the "wave," i.e. a net of
25 or more for the Democrats. Even if we give the Democrats 2/1 odds in
each contest, with this battlefield there is still only a 35% chance
that they net 25 or more seats.

What are the true probabilities for these races? I honestly have not
the foggiest idea beyond some basic intuitions (e.g. the GOP has a less
than 50% chance in at least 6 to 7 seats). Of course, lots and lots
(and lots and lots) of people are ready and willing to assign very
specific probabilities to these races. But are they able? What are the data points we should use in such an endeavor?

Should we use House polls from companies we have never heard of, who
are obviously pushing polls to drum up business for themselves after
the election, who use samples that have strange origins, who use
methods that are unpublished and probably underdetermined, who publish
results that contradict other polls?

Should we use the rumors and innuendos we happen to stumble upon,
the inside gossip to which almost all of us are not privy, and to which
- if we are privy - we hear third, fourth, or fifth hand?

Should we use a favored set of anecdotes, interesting stories told
by local news outlets on a given horse race that capture our attention,
even if its actual effect on the race is undeterminable?

Assuming we can use any of this data - a condition that I think remains unfulfilled - another question presents itself: how
do we use this data to assign odds? What weight should we give each
data point? I honestly have no clue. It seems to me to be quite easy,
in almost all districts, to write a storyline, a believable storyline,
that favors one side over the other - and from that assign a
probability. But, at the end of the day, what is the data that is
inducing this assignment? It seems to me that it is is just a set of
questionable polls, unfounded rumors and potentially irrelevant
anecdotes to which we have, without any real justification, assigned
determinative weight.

Ultimately, only one poll counts in the end. So don't get swayed into not doing so.

Matthew Sheffield
Matthew Sheffield
Matthew Sheffield, creator of NewsBusters and president of Dialog New Media, an internet marketing and design firm, left NewsBusters at the end of 2013