The Texas Sharpshooter Fallacy
Sibling Fallacy: Cum Hoc, Ergo Propter Hoc
There have been several dramatic time-space clusters of leukemia reported in which, following an initial observation of two or more cases in a locality, a time unit and geographical area are selected so as to best define a time-space cluster. Such a posteriori clusters are analogous to the story of the Texas sharpshooter who would shoot his rifle at the side of a barn and then carefully draw a target around each bullet-hole so that each bullet-hole passed exactly through the center of the “bull’s-eye.” Although a posteriori clusters do serve to demonstrate that cases can cluster in time and space, they do not allow for determining whether this is more than a chance occurrence.5
As the story is told in the above quote, the Texas sharpshooter is a fabled marksman who fires his gun randomly at the side of a barn, then paints a bullseye around the spot where the most bullet holes cluster. The story of this Lone Star state shooter has given its name to a fallacy first described in the field of epidemiology, which studies how disease spreads in a population. The story seems to have first appeared in print in 19775, and awareness of the story and its connection to a fallacy spread in the 1990s6.
Each year…epidemiologists regularly hear from people worried that their town has been plagued with an unusually large visitation [of cancer cases]. … The Erin Brockovich incident, one of the most famous, is among the many that have been debunked. Hexavalent chromium in the water supply of a small California town was blamed for causing cancer, resulting in a $333 million legal settlement and a movie starring Julia Roberts. But an epidemiological study ultimately showed that the cancer rate was no greater than that of the general population. The rate was actually slightly less.1
This fallacy occurs when someone jumps to the conclusion that a cluster in data must be the result of a cause, usually one that it is clustered around. By a "cluster" in data what is usually meant is a higher number of data points in a location than expected. Such data points can cluster in space―such as a higher number of cases of a disease than expected in a particular place―or in time―such as a higher number during a certain timespan―and usually in both. The cause is usually taken to be something in the same location.
There are two reasons why this kind of reasoning is fallacious:
- The data cluster may well be the result of chance, in which case it was not caused by anything.
- Even if the cluster is not the result of chance, there are other possible reasons for the clustering other than the chosen cause, and testing must be done to identify what the actual cause is among the many possibilities.
At best, the occurrence of a cluster in the data is the basis not for a causal conclusion, but for the formation of a causal hypothesis which needs to be tested. Patterns in data can be useful for forming hypotheses, but they are not themselves sufficient evidence of a causal connection. Clustering is a form of correlation, and correlation alone does not prove causation.
- This fallacy lives up to its striking name because the Texas sharpshooter took a random cluster of bullet holes and, by drawing a target around it, made it appear to be the result of a cause, namely, his marksmanship. As long as one is familiar with the vivid tale of the Texan, it's an easily remembered name, and the story itself reminds us of the nature of the mistake.
- One source of this fallacy is that people tend to underestimate how clumpy random processes can be7. For instance, using a random number generator, I just produced the following series of heads (H) and tails (T) "coin flips": HTHTTTTHTH8. Notice the run of four tails in the middle, and this is only ten flips: a longer series of flips is likely to produce even longer runs of heads or tails. Any random process will produce similar clusters.
- That said, it should be kept in mind that it is not fallacious to form a hypothesis based on a cluster of data points. In fact, in 19th century London, the early epidemiologist John Snow formed the hypothesis that cholera was a waterborne disease―as opposed to an airborne one, as thought at the time―from an unusually large number of cases clustered around a particular water source. However, Snow didn't simply jump to the conclusion that the water caused cholera, but tested the hypothesis in various ways9.
- While this fallacy was initially identified in reference to clusters of disease cases, it is not limited to the field of epidemiology. For instance, during World War II, Germany fired hundreds of V1 and V2 missiles at London, England. Londoners noticed that some areas of the city seemed to be hit by the missiles at a higher rate than others; in other words, the missile strikes formed clusters. As a result, the rumor spread that areas that were hit by fewer missiles were intentionally being avoided because they were home to German spies. However, neither type of missile was capable of such precise targeting, and they frequently missed the city entirely. So, the clustering was simply the result of random chance.10
A reader wrote in about an earlier version of this entry:
When I finished reading your entry, my spidey sense perked up. Hexavalent chromium is a particularly dangerous chemical to be found in drinking water. When you wrote: "At best, the occurrence of a cluster in the data is the basis not for a causal conclusion, but for the formation of a causal hypothesis which needs to be tested." Life has already tested for you. The chemical is a known carcinogen; if it is in the water exposing people for a number of years, there is no doubt that it will cause cancer in at least some of those residents. The company was responsible for providing clean water to residents and for cleaning the contaminated water. Providing for the people suffering from cancer that may not be caused by hexavalent chromium, but as Occam notes, certainly could have been, both then and in the future, is not unreasonable. That chemical should have never been allowed to contaminate the water to begin with.
It does not matter at all that the epidemiology report found a lower cancer rate, especially if a high percentage of those "fewer" cancers turned out to be caused by hexavalent chromium. The sick were 100% sick and the company was at fault.
Now, go beg forgiveness from Erin Brockovich and the people she represented.
Reply: I didn't write that passage about hexavalent chromium―Chromium 6―and Erin Brockovich; it's a quote from an old Slate article1, so if you have a problem with it you should take it up with the article's author. The quoted sentence that I did write is a general remark about data clusters, not about the Brockovich case specifically, and I stand by it.
That said, I don't know what you mean by "life has already tested for you". Disease causation can't be determined just from empirical observation, which is why we have randomized studies with experimental and control groups. This fact cuts both ways: not only can you not determine that Chromium 6 causes cancer just from "life", neither can you determine that it does not just from the fact that the cancer rate was lower in that area. There could be many reasons why the rate was lower even if it does cause cancer and people in the area were exposed to it. However, the evidence of a lower cancer rate in that area certainly suggests, though it doesn't prove, that either it doesn't cause cancer, does so only at an undetectably low rate, or the exposure level of those who lived there was extremely low. All such hypotheses need to be experimentally tested.
In any case, the point of this entry is not to adjudicate the effects of Chromium 6 or the justice of a particular lawsuit, but to explain the nature of the Texas sharpshooter fallacy. If you want an analysis of what was wrong with the Brockovich case, I refer you to a fairly recent article on the topic12.
In the meantime, you might want to have your "spidey sense" recalibrated.
- George Johnson, "Cancer Cluster or Chance?", Slate, 3/19/2013. Johnson attributes the name "Texas sharpshooter effect" to Seymour Grufferman, but Grufferman did not use this phrase in his paper; see note 5, below.
- Steven Milloy, Science Without Sense: The Risky Business of Public Health Research (1995), pp. 25-26.
- Leonard A. Sagan, Electric and Magnetic Fields: Invisible Risks? (1991), p. 134.
- Michael Blastland & Andrew Dilnot, The Numbers Game: The Commonsense Guide to Understanding Numbers in the News, in Politics, and in Life (2010), pp. 34-35.
- Seymour Grufferman, "Clustering and Aggregation of Exposures in Hodgkin's Disease", Cancer, 39:1829-1833, 1977. Paragraphing suppressed.
- I base this claim on a search of Google Books for the phrase "Texas sharpshooter" conducted on 10/4/2022.
- See: John Allen Paulos, Innumeracy: Mathematical Illiteracy and Its Consequences (1989), pp. 44-45.
- I used the following randomizer: "Coin Flipper", Random, accessed: 10/7/2022. Try it yourself! It's fun!
- See: Thomas C. Timmreck, An Introduction to Epidemiology (2nd edition, 1998), pp. 419-441.
- See: Gary Smith, Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics (2014), pp. 163-167.
- Added: 10/7/2022.
- Brian Dunning, "Hinkley: The Erin Brockovich Case", Skeptoid, 9/1/2020.