Taxonomy of the Multiple Comparisons Fallacy

Etymology:

The name "multiple comparisons fallacy" appears to come from the science of epidemiology, where comparisons may be made between a diseased group and a healthy group in order to find a difference between the two that might point to the cause of an epidemic. For instance, if every member of the diseased group drank from a particular well and no member of the healthy group did so, that would suggest that the pathogen might be present in the well water. In order to find the source of an epidemic, multiple comparisons between the groups may be drawn.

Example:

…[I]n 1992, a landmark study appeared from Sweden. A huge investigation, it enrolled everyone living within 300 meters of Sweden's high-voltage transmission line system over a 25-year period. They went far beyond all previous studies in their efforts to measure magnetic fields, calculating the fields that the children were exposed to at the time of their cancer diagnosis and before. This study reported an apparently clear association between magnetic field exposure and childhood leukemia, with a risk ratio for the most highly exposed of nearly 4.

…Surely, here was the proof that power lines were dangerous, the proof that even the physicists and biological naysayers would have to accept. But three years after the study was published, the Swedish research no longer looks so unassailable. …[T]he original contractor's report…reveals the remarkable thoroughness of the Swedish team. Unlike the published article, which just summarizes part of the data, the report shows everything they did in great detail, all the things they measured and all the comparisons they made. …[N]early 800 risk ratios are in the report….

Analysis

Exposition:

In inductive reasoning, there is always some chance that the conclusion will be false even if the evidence is true. In other words, the connection between the premisses and conclusion is never 100%―that's only for deductive reasoning. So, the question arises: what level of probability―called a "confidence level"―are we willing to accept in our reasoning? In scientific contexts, the confidence level is usually set at 95%.

When the confidence level is set at 95%, there is a probability of one in twenty―that is, 5%―that a misleading result will occur simply by chance. This has an important consequence that when overlooked leads to the multiple comparisons fallacy. For instance, when comparisons are done in epidemiology, there is a one in twenty chance that such a comparison will show a statistically significant difference. So, if twenty or more comparisons are made in a single study, it will likely get a statistically significant result just by chance. Thus, it's necessary to use a higher confidence level in cases of multiple comparisons.

Actually, the situation is worse still: if the things being compared are statistically independent, then it takes only fourteen comparisons for it to be more likely than not to get a statistically significant result by chance. This is a result of the multiplication rule of probability theory. (See the entry for Probabilistic Fallacy for the details.)

Another common case of the multiple comparison fallacy occurs in opinion polling, especially during presidential elections. So-called scientific polls typically use a 95% confidence level to determine the sizes of their samples and their margins of error. During national elections there are usually many more than fourteen polls taken, so that it is likely that one or more such polls will be misleading. As a consequence, it is important to compare all of the polls taken at about the same time, and discount outliers. However, this is seldom done by the news media.

Exposure:

The multiple comparisons fallacy is occasionally referred to as "the Texas sharpshooter's fallacy", but I use this name for a different type of mistake. The anecdote that gives rise to the name is that a Texan shoots randomly at the side of a barn, then draws a bullseye around a cluster of bullet holes and claims to be a sharpshooter. This story fits the mistake of jumping to the conclusion that a random cluster of data must be causally related better than it does the multiple comparisons fallacy. A better anecdote for the latter would be a shooter who first draws the bullseye, then randomly shoots twenty times at the barn. Having made one bullseye, the shooter then proceeds to conceal the nineteen misses and claims to be a sharpshooter.

Source: Po Bronson, "A Prayer Before Dying", Wired, 2002

Resource: How to Read a Poll: The Confidence Game", Fallacy Watch

Acknowledgment: Thanks to David Nichols for a couple of corrections in the Exposition.


Analysis of the Example:

When scientists saw how many things they had measured…they began accusing the Swedes of falling into one of the most fundamental errors in epidemiology, sometimes called the multiple comparisons fallacy.

John Moulder: The problem is, when you do as they did, hundreds and hundreds of comparisons, something in the neighborhood of 800 different comparisons, by the standard way we do statistics, we would expect 5 percent of those to be statistically elevated and 5 percent to be statistically decreased. And now you have a problem. If you find, by one measure of exposure, that leukemia is up in a group of kids, is that real, or is that the result of just random noise in the system?

Narrator: … Even if nothing is going on due to power lines, if you measure hundreds of risk ratios, they will scatter by random chance around a mean of one. Some will be above, and some below. Risk ratios below one suggest that EMFs protect against cancer, above one, that they increase the cancer rate. But the published article focused only on the strongest positive risk ratios. The summary highlights a nearly fourfold increase in risk of childhood leukemia. This is what the press picks up and the public hears.

John Moulder: It is not scientifically reasonable to do all the measurements, but then only pick out the ones that give you the answer you want for publication. If I dredge through their original report, I can find situations which, looked at in isolation, without looking at the rest of the report, that if that was the only data I gave you, I could claim that that proved that power lines protected children against childhood leukemia.

Source: "Currents of Fear", Frontline, 1995 (Transcript).


fallacist@fallacyfiles.org