How to Read a Poll
Every other year, during election campaigns, the American public is polled, surveyed, and canvassed for their opinions, and the news media continuously inform us of the results. The media report polls in the same breathless way that race track announcers describe horse races: "As they round the corner of the convention, the Republican is pulling ahead on the right! Now, they're entering the home stretch and the Democrat is pulling up on the left!" Et cetera.
There is little drama in simply waiting until after the election to report the results. Instead, reporters use polls to add suspense to their coverage, with a leader and an underdog to root for. Moreover, every news outlet is trying to scoop the others by being the first to correctly predict the winner. Unfortunately, much of this coverage sacrifices accuracy for artificial excitement.
This article explains how a layman can read a news report of a poll without being duped by the hype. You don't need to be a statistician to understand enough about polls to not be taken in, because the problems are often not with the polls themselves but with the way that they're reported.
First, please take the following unscientific poll:
Opinion polls, like other surveys, are a way of inferring the characteristics of a large group—called "the population"—from a small sample of it. In order for this inference to be cogent, the sample must accurately represent the population. Thus, the main error to avoid is an unrepresentative sample. For example, the most famous polling fiasco was the Literary Digest poll in the 1936 presidential election. The magazine surveyed over two million people, chosen from the magazine's subscriber list, phone books, and car registrations. Even though the sample was enormous, it was unrepresentative of the population of voters because not everyone could afford a phone or car during the Depression, and those who could tended to vote Republican in greater numbers than those who couldn't. As a result of this biased sample, the poll showed Republican Alf Landon beating the actual winner, Democrat Franklin Roosevelt.
So, the first question that you should ask of a poll report you read is: "Was the sample chosen scientifically?" If the poll is a scientific one, then an effort has been made to either choose the sample randomly from the population, or to weight it in order to make it representative of the population. Reputable polling organizations always use scientific sampling.
However, many polls are unscientific, such as most online polls you take using a computer, telephone surveys in which you must call a certain number, or mail-in questionaires in magazines or sent to you by charities. Such surveys suffer from the fault that the sample is self-selected, that is, you decide whether you wish to participate. Self-selected samples are not likely to be representative of the population for various reasons:
- The readers of a particular magazine or the contributors to a specific charity are likely to differ from the rest of the population in other respects.
- Those who take the time and trouble to volunteer for a poll are more motivated than the average person, and probably care more about the survey subject.
- Many such polls allow individuals to vote more than once, thus allowing the results to be skewed by people who stuff the ballot box.
For example, some media outlets sponsor scientific polls but, when the results are reported in their online edition, they are sometimes accompanied by an online poll using a self-selected sample and asking some of the same questions. It is instructive to compare the two, as the results are usually very different.
So, self-selected samples are almost inevitably biased and are, at best, a form of entertainment. They cannot be trusted as a source of information about the population as a whole.
Because polls question only a sample of the population, there is always a chance of sampling error, that is, of drawing a sample that is unrepresentative. For instance, in a political poll, it is possible that a random sample of voters would consist entirely of Democrats, though this is highly unlikely. However, less extreme errors of the same kind are not so unlikely, and this means that every poll has some degree of imprecision or fuzziness. Because the sample may not be precisely representative of the population as a whole, there is some chance that the poll results will be off by a certain amount. Statisticians measure the chance of this kind of error by the "margin of error", or "MoE" for short.
The MoE takes the form "±N%", where usually N=3 in national polls. This margin determines what is called a "confidence interval": for example, if the percentage of a sample who supports candidate R is 46%, and the MoE is ±3%, then the confidence interval is 43-49%. In turn, the confidence interval and the MoE are determined by the "level of confidence", which is usually set at 95% in national polls. What this means is that one can have confidence that in 19 out of 20 such samples the percentage of the population who support candidate R will fall within the confidence interval. So, the chance of the poll being off by more than the MoE is only 5%.
The MoE is a common source of error in news reports of poll results. Most reputable news sources require their reporters to include the MoE in a report on a poll, at least in a note at the end. However, many reporters ignore the MoE in the body of their articles, perhaps because they don't understand what the number means.
Reporters often use polls for "horse race" reporting by comparing the poll numbers of candidates, or to compare current polls to past ones to see if the results are changing. The MoE needs to be factored into such comparisons. For example, suppose that in one poll with a MoE of ±3%, candidate D polls at 36%, and in a later poll D is at 38%. Many newspapers will report this as a 2% rise in support for D between the two polls, as if 2% of undecided voters or previous supporters of other candidates had decided to vote for D since the previous poll. However, given that the MoE is ±3%, the result in the first poll could be as high as 39%, and in the second one as low as 35%. In other words, D's support could have dropped by as much as 4%! The poll results are simply not precise enough to say that there is a real increase in D's support, let alone that such an increase is exactly 2%.
In the previous section, I mentioned the level of confidence—usually 95%—used to determine the MoE and, therefore, the confidence interval. The purpose of a survey is to measure some characteristic―such as support for a candidate―of a sample in order to be able to infer its level in the whole population. A 95% confidence level means that in 19 out of 20 samples, the percentage of the sample with the characteristic should be within the confidence interval of the percentage of the population with the characteristic.
95% confidence sounds pretty confident—and it is!—however, there are a lot of polls done these days. In fact, there are many more than 20 national polls conducted in the U.S. during a presidential election year. This means that even with a confidence level of 95%, we can expect a few polls to be off by more than the MoE as a result of sampling error.
How can we tell when the results of a poll are off by more than the MoE? If a poll gives very different results from others taken around the same time, or shows a sudden and large change from previous polls, this suggests that the unusual result may be due to sampling error. No one can know for sure whether sampling error is responsible for polls with surprising results, but the fact that 1 in 20 polls can be expected to be significantly in error should encourage us to regard such poll results with skepticism. Moreover, it's important to pay attention to all of the polls taken on a given topic at a particular time, otherwise you'll have no way of knowing whether a poll you're looking at is giving wildly different results than comparable polls.
Here's another reason to pay attention to all the comparable polls, as opposed to concentrating on just one. Suppose that five polls are conducted at about the same time showing the following results with a MoE of ±3%:
|Poll||Candidate D||Candidate R||Undecided|
Each of these results is within the MoE so, taken individually, you would have to conclude that neither candidate is really ahead. However, four of the five polls show candidate D with a lead, and the other shows a tie; no poll shows candidate R leading. Of course, it's highly improbable that both candidates have exactly the same level of support, but if they are within a percentage point of each another you would expect the polls showing one candidate ahead to be about evenly divided between the two. Instead, in this example, all of the polls showing one candidate ahead favor candidate D, which is unlikely unless D has a real, albeit small, lead.
Thus, even when individual polls do not show a clear leader, the consensus of all polls may do so. Unfortunately, news stories on polls usually concentrate on one poll at the expense of all others. Many polls are sponsored by newspapers or networks, which get their money's worth by reporting only the results of their own polls, ignoring those sponsored by their competitors. Therefore, it's up to you to check to see whether there are other polls on the same topic, and to compare the results of any comparable polls.
A Checklist of Questions
When you are confronted with a new poll, ask the following questions about it:
- Is the sample scientifically selected or self-selected? If self-selected, the poll is only good for entertainment.
- What is the poll's margin of error? Are any comparisons reported―such as changes in popularity, record highs or lows, or comparisons between candidates―which lie within twice the margin of error, that is, the confidence interval? If so, they are statistically insignificant.
- Have other polls on this issue been done recently? If so, it is a good idea to compare the new poll with these others:
- If the results of the new poll are significantly different from those of most other polls―that is, a difference greater than the confidence interval―then the new poll is probably unreliable.
- If the results of the new poll are within the confidence interval from those of most other polls, then those results are probably reliable, even if they are not statistically significant as individuals.
If the poll you are confronted with fails at any step of this checklist, or if you can't find the answer to these questions in the report, then your confidence in the poll should be much less than 95%.
The Poll Results
If you haven't guessed by now, the online poll was bogus, but not much more bogus than most such polls. If you go back and retake the poll having read the entire article, I hope that you will agree to disagree with all of the questions!
- "Planning to Err? Then Do it as Publicly as Possible", Why Files
- Will Oremus, "Minority Opinions", Slate, 5/17/2012
- Charles W. Roll, Jr. & Albert H. Cantril, Polls: Their Use and Misuse in Politics (1972)
- G. Cleveland Wilhoit & David H. Weaver, Newsroom Guide to Polls & Surveys (1990)
- Sheldon R. Gawiser, Ph.D. & G. Evans Witt, "20 Questions a Journalist Should Ask about Poll Results (Third Edition)", National Council on Public Polls. Every journalist who reports poll results should consult this article or, even better, read the book from which it is taken:
- Sheldon R. Gawiser & G. Evans Witt, A Journalist's Guide to Public Opinion Polls (1994).
- Peverill Squire, "Why the 1936 Literary Digest Poll Failed", Public Opinion Quarterly 52, pp. 125-133.