The Base Rate Fallacy
Alias: Neglecting Base Rates
Suppose that the rate of disease D is three times higher among homosexuals than among heterosexuals, that is, the percentage of homosexuals who have D is three times the percentage of heterosexuals who have it. Suppose, further, that Pat is diagnosed with the disease, and this is all that you know about Pat. In particular, you don't know anything else about Pat's sexual orientation; in fact, you don't even know whether Pat is male or female. What is the likelihood that Pat is homosexual?
When judging the probability of an event―for instance, diagnosing a patient's disease―there are two types of information that may be available:
- Generic information about the frequency of events of that type. In the case of diagnosing a disease, this would be information about the prevalence of the disease.
- Specific information about the case in question. In the case of diagnosis, this would be information about the patient revealed by an examination or tests.
When contrasted with information of type 2, type 1 information is called "base rate" information. For example, if a doctor is considering whether a patient has a certain rare disease, the rarity of the disease is its base rate. In other words, the base rate is the frequency of a generic type of event, leaving aside any information about the specific case at hand.
People who have only generic information tend to use it to judge probabilities, which is the rational thing to do since that's all that they have to go on. In contrast, when people have both types of information, they tend to make judgments of probability based entirely upon specific information, leaving out the base rate. This is the base rate fallacy.
When one has both generic and specific information, it might seem reasonable to ignore the general information in favor of the more specific. This would indeed be the right thing to do if one had to choose only one type of information, but one should instead use all of the information that one has. There is always some possibility that an observation or test may be wrong, and the probability that it is wrong is affected by the base rate.
Source: Amos Tversky & Daniel Kahneman, "Evidential Impact of Base Rates", in Judgment Under Uncertainty: Heuristics and Biases, Kahneman, Paul Slovic, and Tversky, editors (1985), pp. 153-160.
The exact answer to this problem depends upon what percentage of the population is homosexual. We don't know that exactly, but let's suppose that it is 10%. We don't need to be precise since this is a "back of the envelope" calculation designed to check that our intuitive judgments are in the ballpark. So, suppose that we have a population of 100 people, 10 of whom are homosexuals. Suppose, further, that three of the homosexuals have disease D, which means that the rate of the disease among the homosexuals is 3 out of 10, or 30%. Since we are given that the rate of the disease among heterosexuals is one-third of that among homosexuals, we must suppose that 10% of the heterosexuals in the population have D, which means that 9 of the 90 heterosexuals have D. So, the total number of persons with the disease in our population is 12, three of whom are homosexuals. Thus, all that we know about Pat is that he or she has D, so Pat is one of the unlucky twelve. Therefore, the chance that Pat is homosexual is 3 in 12, or 25%.
If you're like most people, you probably estimated that the likelihood that Pat is homosexual is much higher than 25%. If you thought it was 75%, then you were probably basing your estimate on the fact that the rate of the disease is three times higher among homosexuals. In doing this, you neglected to take into consideration the base rate of homosexuality in the population. You might not have had any precise information on this rate, but it is common knowledge that homosexuals are a small minority. For this reason, even though the rate of the disease is three times higher in the homosexual sub-population, it is still more likely that a randomly chosen person with the disease is a heterosexual, simply because they are the vast majority of the population.
To prove that the example above is correct, use Bayes' Theorem from probability theory: Let "h" represent the proposition that Pat is homosexual and "d" the proposition that Pat has disease D. We assumed that the base rate of homosexuality is 10%, so P(h) = .1. Therefore, the probability of Pat not being homosexual is 90%, that is, P(not-h) = .9. We don't know the exact rates of disease D among homosexuals or non-homosexuals, but we are given that the rate among the former is three times the rate among the latter, so P(d | h) = 3P(d | not-h). If we plug this information into Bayes' Theorem, we get the following equation:
P(h | d) =
[3P(d | not-h) × .1]/[3P(d | not-h) × .1] + .9P(d | not-h)
After multiplying and adding, we get:
P(h | d) = .3P(d | not-h)/1.2P(d | not-h)
The "P(d | not-h)"s in both the numerator and denominator cancel out, giving us the answer:
P(h | d) = 3/12 = .25, that is, the probability that Pat is homosexual given that he/she has disease D is 25%.
Source: Maya Bar-Hillel, "The Base Rate Fallacy in Probability Judgments", Acta Psychologica 44, pp. 211-233. The thought experiment is a variation of the "suicide problem" discussed on pages 221-223.