The name "multiple comparisons fallacy" appears to come from the science of epidemiology, where comparisons may be made between a diseased group and a healthy group in order to find a difference between the two that might point to the cause of an epidemic. For instance, if every member of the diseased group drank from a particular well and no member of the healthy group did so, that would suggest that the pathogen might be present in the well water. In order to find the source of an epidemic, multiple comparisons between the groups may be drawn.
…[I]n 1992, a landmark study appeared from Sweden. A huge investigation, it enrolled everyone living within 300 meters of Sweden's high-voltage transmission line system over a 25-year period. They went far beyond all previous studies in their efforts to measure magnetic fields, calculating the fields that the children were exposed to at the time of their cancer diagnosis and before. This study reported an apparently clear association between magnetic field exposure and childhood leukemia, with a risk ratio for the most highly exposed of nearly 4.
In inductive reasoning, there is always some chance that the conclusion will be false even if the evidence is true. In other words, the connection between the premisses and conclusion is never 100%―that's only for deductive reasoning. So, the question arises: what level of probability―called a "confidence level"―are we willing to accept in our reasoning? In scientific contexts, the confidence level is usually set at 95%.
When the confidence level is set at 95%, there is a probability of one in twenty―that is, 5%―that a misleading result will occur simply by chance. This has an important consequence that when overlooked leads to the multiple comparisons fallacy. For instance, when comparisons are done in epidemiology, there is a one in twenty chance that such a comparison will show a statistically significant difference. So, if twenty or more comparisons are made in a single study, it will likely get a statistically significant result just by chance. Thus, it's necessary to use a higher confidence level in cases of multiple comparisons.
Actually, the situation is worse still: if the things being compared are statistically independent, then it takes only fourteen comparisons for it to be more likely than not to get a statistically significant result by chance. This is a result of the multiplication rule of probability theory. (See the entry for Probabilistic Fallacy for the details.)
Another common case of the multiple comparison fallacy occurs in opinion polling, especially during presidential elections. So-called scientific polls typically use a 95% confidence level to determine the sizes of their samples and their margins of error. During national elections there are usually many more than fourteen polls taken, so that it is likely that one or more such polls will be misleading. As a consequence, it is important to compare all of the polls taken at about the same time, and discount outliers. However, this is seldom done by the news media.
The multiple comparisons fallacy is occasionally referred to as "the Texas sharpshooter's fallacy", but I use this name for a different type of mistake. The anecdote that gives rise to the name is that a Texan shoots randomly at the side of a barn, then draws a bullseye around a cluster of bullet holes and claims to be a sharpshooter. This story fits the mistake of jumping to the conclusion that a random cluster of data must be causally related better than it does the multiple comparisons fallacy. A better anecdote for the latter would be a shooter who first draws the bullseye, then randomly shoots twenty times at the barn. Having made one bullseye, the shooter then proceeds to conceal the nineteen misses and claims to be a sharpshooter.
Source: Po Bronson, "A Prayer Before Dying", Wired, 2002
Resource: How to Read a Poll: The Confidence Game", Fallacy Watch
Acknowledgment: Thanks to David Nichols for a couple of corrections in the Exposition.
When scientists saw how many things they had measured…they began accusing the Swedes of falling into one of the most fundamental errors in epidemiology, sometimes called the multiple comparisons fallacy.