The Regression Fallacy

Alias: The Regressive Fallacy1

Taxonomy: Logical Fallacy > Informal Fallacy > Non Causa Pro Causa > The Regression Fallacy

Etymology:

To "regress" is to go back, or revert to an earlier or more primitive state. The statistical term "regression" seems to have been first used by Francis Galton2, Charles Darwin's cousin, to refer to the phenomenon on which the regression fallacy is based. Galton noticed that the children of tall parents tended to themselves be tall, but not as tall as their parents. Galton called this "regression to mediocrity", but nowadays it is usually referred to as "regression to (or towards) the mean" (see the Exposition below).3

Example:

KUALA LUMPUR: Prime Minister Datuk Seri Dr Mahathir Mohamad congratulated Malaysian shuttler Mohd Hafiz Hashim for his achievement but warned that he should not be "spoilt" with gifts like previous champions.

"Very good and congratulations, but now I would like to request everybody not to spoil him," he said when asked to comment on Hafiz's victory in the men's singles final of the All-England Badminton Championships on Sunday.

Dr Mahathir said people should remember what had happened to previous champions when they were spoilt with gifts of land, money and other items.

"I hope the states will not start giving acres of land and money in the millions, because they all seem not to be able to play badminton after that," he said after taking part in the last dry run and dress rehearsal for the 13th NAM Summit at the PWTC yesterday.4

Exposition:

The Regression Fallacy is the result of a statistical phenomenon known as "regression to the mean". The "mean" refers to the arithmetical average of some variable in a population, that is, the "mean" is what we usually mean by "average". "Regression" refers to the value of the variable tending to move closer to the mean, away from extreme values. So, "regression to the mean" refers to the tendency of a variable characteristic in a population to move away from the extreme values towards the average value.

Consider a sample taken from a population. The value of the variable will be some distance from the mean. For instance, we could take a sample of people—it could be just one—measure their heights, and then determine the average height of the sample. This value will be some distance away from the average height of the entire population of people, though the distance might be zero.

Suppose, further, that we take a second sample of the population. If the value for the first sample is an extreme one—that is, far away from the mean—then it is likely that the value of the variable for the second sample will be closer to average than the first one. The farther away from the mean the first sample was, the more likely that the second will be closer to it. This is regression to the mean.

For example, the children of tall parents tend to be tall themselves, but not as tall as their parents. The fact that the children tend to be taller than average is probably the result of genetics, but the fact that they tend not to be as tall as their parents is the result of regression to the mean. The Regression Fallacy occurs when one mistakes regression to the mean, which is a statistical phenomenon, for a causal relationship. For example, if a tall father were to conclude that his tall wife committed adultery because their children were shorter, he would be committing the regression fallacy.

Exposure:

One of the most common occasions for the Regression Fallacy is illness. People are most likely to seek treatment for an illness—especially experimental treatment—when they are at their sickest, that is, their condition is an extreme one. They take a remedy, and then get better due to regression to the mean, but they attribute their regained health to the effect of the remedy. This is one reason why some people will swear by such bizarre treatments as drinking urine, or psychic surgery.

"It worked for me", they say, when all they really know is that they took the remedy and they got better. Due to regression to the mean, many people will get better no matter what treatment they take, even none at all. Some will die, luckily for the snake oil salesmen, since the dead won't be around to badmouth the snake oil that they took before dying.

Regression to the mean is one reason why it is difficult to determine whether a potential remedy is really effective; one cannot tell simply by taking it when ill.

A reader makes the following objection to the Example:

I noticed something I disagree with: you give, as an example, a tennis player who wins tournaments and then doesn't win. However, the "regression to the mean" occurs because one assumes that luck was involved in the first observation being so far away from the mean. This is, strictly speaking, true in all circumstances―but the proportion of luck to skill involved in winning a tennis championship, I suspect, is greatly tipped towards skill. Thus, I think it improper to suggest that what "caused" the tennis players to not win again was solely regression to the mean, when the Malaysian prime minister may, in fact, have been right in believing that the winners got spoiled―logically, nothing favors one explanation over the other.

For regression to the mean to occur it isn't necessary that a phenomenon be entirely, or even primarily, random. So, even though skill is the primary factor in winning tennis, as long as there is some degree of luck involved, regression to the mean may occur. The prime minister might be right about what is causing the champions' playing to deteriorate, but he gives no evidence that the regression is due to the players being spoiled, other than the fact that they regressed. That's what makes this an example of the fallacy: the fact that the prime minister concludes that the players are spoiled on no other evidence than the fact that their play got worse, which is to be expected due to regression to the mean.

This is not a claim about what "caused" the tennis players' subsequent losses, since regression to the mean is a statistical, not a causal, phenomenon. Regression to the mean was first described by Francis Galton who was recording the heights of parents and their children. As mentioned above in the Etymology, the children of tall parents tend to be tall, but shorter than their parents; similarly, the children of short parents tend to be short, but taller than their parents. Of course, height is not a purely random factor, otherwise there would be no tendency for the height of children to be similar to the height of their parents. Causally, height is partly inherited from our parents, but there is also an element of chance in it, otherwise the height of a child would be precisely predictable from the heights of the parents. It is this random element that leads to regression to the mean.

How could the prime minister have nonfallaciously reasoned that the tennis players were indeed spoiled? He could have done so by comparing a group of players who were lavished with gifts to one that was not: if the pampered group regressed worse than the unpampered, that would go some distance to showing that the gifts really were spoiling the players.

Notes:

1. Robert Todd Carroll, "Regressive Fallacy", The Skeptic's Dictionary
2. J. J. O'Connor & E. F. Robertson, "Francis Galton", MacTutor History of Mathematics Archive, 10/2003
3. Martin Bland, "Regression towards the mean, or: Why was Terminator III such a disappointment?", University of York, 1/2004
4. "Mahathir asks states not to 'spoil' Hafiz", The Star Online, 2/18/2003; via: Tim van Gelder, "Critical Reflections", 2/18/2003