Previous Month | RSS/XML | Current | Next Month

# WEBLOG

##### December 25th, 2015 (Permalink)

### A Puzzle for Christmas

Klaus has a problem: an old college friend of his has invited him to visit on Christmas day, and Klaus has decided to bring presents for his friend's children. Unfortunately, he hasn't seen his friend since the childless days of college, and has never met the children before. Moreover, he has a notoriously faulty memory for other people's children. He does remember his friend announcing the birth of a new child twice; furthermore, he recalls the friend using the masculine pronoun in reference to one of the two, but not which one. However, that's all that he *can* remember for sure: his friend has two children and at least one is a boy. What kind of presents should he buy?

Obviously, he should buy one present for a boy, but what about the other one? Should he purchase a present that a girl might like or another one for a boy? Of course, he could always call his friend and ask, but that would be embarrassing: how could he have forgotten such vitally important information? He could buy three presents, two for boys and one for a girl, but how would he manage to explain away the third present? It would appear that he thought his friend had three children, which would be even more embarrassing! Another possibility was to buy a present that would be suitable for either sex, but Klaus simply couldn't think of a toy that would work. It was hard enough figuring out what a modern boy or girl would like!

Finally, Klaus decided that he simply must take his chances and buy a single present for a boy or a girl based on the odds. But was it an even chance that his friend had an additional boy or a girl? If so, then he might as well just flip a coin. What should Klaus do? Should he buy a present for a boy or for a girl? When you think that you know the answer, click on "Solution", below.

##### December 15th, 2015 (Permalink)

### At Risk of Pinocchios

The Washington Post's fact-checker Glenn Kessler put a list out yesterday of "the biggest Pinocchios of 2015", that is, the "biggest" errors that failed a fact-check in the past year. I gather that what made the list were those errors that received four "Pinocchios", which is the highest―or lowest, depending on how you look at it―rating in the fact-checker's rating system. At the end of the list is a "special award" for "bushels of bogus sex trafficking statistics", among which is one from May concerning the claim that 300,000 American children are "at risk" of sexual exploitation.

I missed the May article, which is unfortunate since it concerns a topic I had dealt with last year―see the Resource, below. Before looking at this specific issue, there's a general problem with the notion of "at risk of X", whatever X is. Even if X is a precisely defined concept, "at risk of X" will probably not be. So, why not just count cases of X rather than trying to count the less precise concept of "at risk of X"? One obvious reason is that the latter is likely to be a larger class, and the resulting number of cases greater. For this reason, activists who wish to use large numbers to galvanize support for anti-X activities will gravitate towards using the broader concept.

Furthermore, in order to count cases, we need a precise definition of what it is to be at risk of X, but that will be defined by those doing the counting. Also, the definition of a vague concept such as "at risk of X" is more open to manipulation than that of a more precise concept such as X itself. Assuming that the definition doesn't provide a large enough number to scare up anti-X support, it can always be broadened.

For these reasons, when you see the concept of "at risk for X", whatever X happens to be, you should be on your guard. Here are some critical questions that should be asked about the use of this concept:

- Why not count actual cases of X rather than those supposedly at risk of it?
- Who is it that defines "at risk for X"?
- Do the definers have a motive for exaggerating the riskiness of X?
- What, exactly,
*is*the definition of "at risk of X"? - Is it easy to find the definition, or is it concealed?
- Does the definition seem reasonable, or are there signs that it has been made broad in order to inflate the numbers?

If you can't answer these questions at all or with reassuring answers, then you probably shouldn't rely upon the number of "at risk for X". Now, let's turn to the specific issue of "at risk of sexual exploitation" and fact-check the fact-checker.

The claim that Kessler fact-checked is that 300,000 American children are at risk of commercial sexual exploitation. Here's what Kessler says about the provenance of this figure:

…[T]he 300,000 figure comes from a 2001 report written by Richard J. Estes and Neil Weiner of the University of Pennsylvania. … The report suggested that about 326,000 children were “at risk for commercial sexual exploitation"….

The second claim is correct, but the first is at least dubious, as I pointed out in my previous entry on this issue. If the Estes and Weiner (E&W) estimate is the source of the number, why would it be rounded down to 300,000? The last thing that politicians who are advocating legislation against commercial sexual exploitation would want to do is minimize the number of those at risk.

My own research indicated that the estimate came from a previous report from the mid-'90s, which means that it's almost twenty years old, and E&W specifically rejected all previous estimates as unreliable. However, the E&W report may be the proximate source for the 100,000-300,000 estimate, since it does mention it if only to reject it. It's mentioned early (p. 4) in a long report―over 200 pages―so it's possible that a lazy researcher looking for numbers to cite came across it and didn't notice or care that it was not the estimate of the report itself, but a previous one rejected by the report.

I give Kessler one Pinocchio.

**Sources:**

- Glenn Kessler, "The biggest Pinocchios of 2015", The Washington Post, 14/14/2015
- Glenn Kessler, "The bogus claim that 300,000 U.S. children are ‘at risk’ of sexual exploitation", The Washington Post, 5/28/2015

**Resource:** One "myth" that's not quite dead yet, 9/5/2014

##### December 9th, 2015 (Permalink)

### Dueling Headlines, Dueling Polls

## Ted Cruz takes lead from Donald Trump in new Iowa poll

## Trump holds 13-point lead in CNN’s Iowa poll

Can both of these polls be right? Trump's 13 percentage point lead over Cruz in the CNN/ORC poll is highly significant, so you can't explain this away as simply statistical noise.

You might suspect that they were conducted at different times, which would certainly explain how two polls could differ so much. However, though they were not conducted during exactly the same time period, the polls did overlap: the first was conducted by Monmouth University from the 3rd to the 6th of this month, whereas the second, CNN/ORC poll ended on the same day but started six days earlier. Could the inclusion of some samples from November 28th to December 2nd account for such a large difference in results? If public opinion can change so much so fast then it's unlikely that any poll this far from the caucuses will be of any use in predicting the results.

How about sampling bias, that is, were the samples different in some relevant respect? Both polls were aimed at sampling likely voters in the Iowa Republican caucuses, but according to The Hill's article:

Monmouth drew its sample from lists of registered voters who voted in at least one prior state primary, in a recent general election or registered to vote in the past year. CNN drew its sample by asking adults about their past participation patterns and intensions.

One problem with polls aimed at sampling likely voters is that every polling organization has its own definition of "likely voter". As a result, each is sampling a somewhat different population, which makes it difficult to compare polls from different pollsters. If the different results are explained by the different definitions, which definition comes closest to capturing the group of people who will actually take part in the caucuses? We may have to wait until after the caucuses are over to find out.

**Sources:**

- "Full results: CNN/ORC poll of Iowa Republicans", CNN, 12/7/2015
- "Iowa: Cruz Takes Caucus Lead", Monmouth University, 12/7/2015
- Josh Haskell, "Two New Polls Show Ted Cruz, Donald Trump Vie for Lead In Iowa", ABC News, 12/7/2015
- Jonathan D. Salant, "Trump trails in new Iowa GOP poll but another has him leading. Here's why", NJ.com, 12/8/2015
- Maxwell Tani, "Two new Iowa polls showed wildly different results for Donald Trump", Business Insider, 12/8/2015

**Resource:** How to Read a Poll

**Update (12/13/2015):** A newer poll, conducted for Bloomberg Politics and the Des Moines Register newspaper, puts Cruz ahead of Trump by ten percentage points, which is a significant lead. This suggests that the Monmouth poll showing Cruz in the lead was capturing a real surge in support for Cruz. Of course, this doesn't explain what happened with the CNN/ORC poll. Perhaps the CNN poll missed most of the surge, or maybe it's just that one-in-twenty poll that's wrong by more than its margin of error. In any case, if it's true that Cruz has risen so much in such a short time, then it appears that the race for support in Iowa is highly volatile and could change an equal amount in the month and a half left before the caucuses.

**Sources:**

- Jennifer Jacobs, "'Big shakeup' in Iowa Poll: Cruz soars to lead", The Des Moines Register, 12/13/2015
- John McCormick, "Cruz Soars to Front of the Pack in Iowa Poll; Trump Support Stays Flat", Bloomberg, 12/12/2015

##### December 4th, 2015 (Permalink)

### Headline

## 400-Year-Old Hearts Had Same Diseases As Hearts Of Today

If they're 400 years old they're doing pretty well.

**Solution to a Puzzle for Christmas****:** (Added on 7/26/2019: The following solution, I now believe, is incorrect. The correct solution is the intuitive one that the probability is one-half, which means that this isn't a very interesting puzzle. However, what *is* interesting is the incorrect solution and why it is wrong. See the Correction appended, below, for details on the correct solution and why the following "solution" is wrong.)

As counter-intuitive as it may seem, Klaus should buy a present for a girl, though this is not because his friend having had one boy a girl was due. Rather, the odds of a particular child being a girl is almost exactly half, which might make you think that it doesn't matter what present Klaus buys, since the odds will be the same for a boy or girl.

However, all that Klaus knows is that his friend has two children, one of whom is a boy. Thus, since both children being girls is ruled out, there are three equally likely possibilities:

1 | 2 | 3 | |
---|---|---|---|

Oldest: | Boy | Boy | Girl |

Youngest: | Boy | Girl | Boy |

In only one of these possibilities―namely, the first―is the other child a boy. In the other two possibilities, the other child is a girl. Thus, if Klaus buys a present for a girl, the odds are 2 in 3 that he will have bought the right present.

**Sources:**

- Martin Gardner, The 2nd Scientific American Book of Mathematical Puzzles & Diversions (1961), pp. 152-153, 159 & 226
- Jeremy Stangroom, Einstein's Riddle: Riddles, Paradoxes, and Conundrums to Stretch Your Mind (2009), pp. 18 & 92

**Resource:** "Ask Dr. Math: Boy or Girl?", Math Forum at Drexel University. If you're still puzzled or not convinced by the above solution, read this brief article.

**Correction (7/26/2019):** Since writing this entry, I've discovered an argument^{1} that the above solution is incorrect. Those familiar with traditional probability puzzles will recognize the above puzzle as a familiar one about the probability of two boys/girls in a family when all that we know is that there is at least one boy/girl. I've wrapped up the bare puzzle within a story about Klaus and his friend's kids, but its logical structure is unchanged.

Now, I'm not an expert on probability, though I play one on the web, but Martin Gardner^{2}, John Allen Paulos^{3}, and Dr. Math^{4} have all given the above answer. So, if that answer really is wrong, as I'm now inclined to think, then I was in good company. So, if you bought the argument I gave, above, you're also in good company. In contrast, if you thought it sounded screwy, good for you.

Most discussions of the puzzle include a follow-up puzzle^{5}, though I did not do so. The follow-up goes like this: What if, in addition to knowing that his friend has two children, one of whom is a boy, Klaus remembers that the eldest child is a boy? What, then, is the probability that the youngest child is also a boy? Note that it doesn't matter whether it's the other way around: Klaus could remember that the youngest child is a boy, and the question could be about the eldest. The same solution as that given below will work, *mutatis mutandis*^{6}.

The answer to this additional puzzle is that the probability is one-half! The reason is easily seen by looking at the table, above. This additional information eliminates the third possibility, leaving only the first two. In only one of these cases are both children boys, namely, the first one. So, the probability must be one out of two.

Now, this solution is indeed correct, but it leads to the conclusion that the answer to the original puzzle must be wrong. Here's why:

Returning to the original puzzle, all that Klaus knows is that his friend has two children one of whom is a boy. However, that boy must be the eldest of the two children or the youngest. Suppose that he's the eldest. Then, by the solution to the second puzzle, the probability that both children are boys is one-half. On the other hand, suppose that he's the youngest. Then, a similar argument to that given above, with the necessary changes made, leads to the same conclusion, namely, that the probability is one-half. Since the boy is either the eldest or the youngest, and in both cases the probability is the same, we can conclude that the solution to the original puzzle is one-half^{7}.

The problem with the reasoning to the "solution" of the original puzzle is that it doesn't go far enough. The breakdown of the problem into three equally probable cases was correct. However, it is a logical truth that the boy mentioned in the puzzle is either the eldest or youngest of the two children, which allows us to break the problem down further to the two possible cases of eldest or youngest, and in both cases the probability is one-half. This helps explain why smart people such as Gardner and experts such as Paulos could make the mistake of giving the wrong solution to the original puzzle: they just stopped too soon.

**Notes:**

- Gary Smith, Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics (2014), pp. 93-96.
- See the Sources, above.
- John Allen Paulos, Innumeracy: Mathematical Illiteracy and Its Consequences (1989), p. 64.
- See the Resource, above.
- Of the Sources and Resources listed above, only Stangroom presents the original puzzle without the follow-up.
- Translation: "Making the necessary changes", Latin. See: Eugene Ehrlich, Amo, Amas, Amat and More: How to Use Latin to Your Own Advantage and to the Astonishment of Others (1985).
- The reasoning in this paragraph takes the form of an argument by cases, which is a standard form of reasoning in logic and mathematics.