Previous Month | RSS/XML | Current
As discussed in lesson 20, the technique of turning a polysyllogism into a chain of categorical syllogisms can show the validity of an argument that a single standard Venn diagram could not handle. However, the circles of Leonhard Euler, introduced in the previous lesson, can be used to show validity in a single diagram. To see how this method works, let's apply it to the polysyllogism used as an example in lesson 19:
There are three premisses, all of which are A-type statements, that need to be represented in our diagram. Recall from the previous lesson that Euler represented such statements by drawing a circle for the subject class inside a circle for the predicate class. In this case, it doesn't matter which premiss you start with, so lets begin at the beginning with the first premiss. "Sapsuckers" is the subject class and "woodpeckers" is the predicate class, so we represent the first premiss as shown above.
Turning now to the second premiss, remember that in diagramming arguments the premisses are all represented on a single diagram, whether Venn or Euler. So, we need to show on the same diagram that the class of woodpeckers is contained within the class of birds. Since we already have a circle for woodpeckers, all that we need is a new circle for birds, and the former should be inside the latter as shown above.
Notice that the second diagram shows that the class of sapsuckers is a subclass of the class of birds―in other words, all sapsuckers are birds―which was the intermediate conclusion in the chain argument given to show this polysyllogism valid in lesson 19. The final step is to diagram the third and last premiss, which means placing the "Birds" circle within a circle representing all animals, as shown above.
The finished diagram clearly shows the logical relationships between the four classes, and you can see that the conclusion is true and, therefore, the argument is valid. In my opinion, this is far easier and more perspicuous than the chain argument of lesson 19. However, that's just one example, so let's look at another example from that lesson, this one including an E-type statement:
Since we already know how to diagram A statements, let's consider premisses 1 and 3 first. The result of diagramming both will look like the second diagram above but with "flickers" in place of "sapsuckers". Now, to diagram the second premiss, we must add a circle representing mammals that is disjoint from the circle for birds; the result looks as shown. Again, you can see from the diagram that the conclusion of the argument is true―that no flickers are mammals―and, thus, that the argument is valid, since it shows that the classes of flickers and mammals are disjoint.
As I mentioned in the previous lesson, Euler's diagram's for particular statements―that is, I- and O-type statements―are what led to Venn's different approach to using circles to represent classes. In the next lesson, we'll see how to combine Venn's technique with Euler's to diagram polysyllogisms with particular premisses. In the meantime, here's a polysyllogism to practice diagramming:
Exercise: Use an Euler diagram to show the following polysyllogism valid:
* ↑ For previous lessons in this series, see the navigation panel to your right.
The combination of a lock is four digits long and each digit is unique, that is, each occurs only once in the combination. The following are some incorrect combinations.
Can you determine the correct combination from the above clues?
Just because a digit is not in a clue doesn't mean that it isn't in the solution.
Try reasoning by elimination; if you're not sure what that is or need a refresher, see: Solving a Problem by Elimination, 6/20/2023.
3 7 2 0
* ↑ Previous "Crack the Combination" puzzles: I, II, III, IV, V, VI, VII, VIII, IX, X
The Fallacy Files Taxonomy of logical fallacies is―that is, there's a brand new version of it: just click on "Taxonomy" to your upper right. In case you're interested, the old versions are still available from the following page, where you can also read about how you might make use of the taxonomy: The History of the Taxonomy. Check it out!
I'm recommending the following article largely because I've never seen one on bad charts in The Washington Post before. Almost all of the charts shown are bad in ways I've also never seen before, so I won't have anything to say about most of them. What's interesting is not so much the charts themselves as that such atrocious charts were presented at all, especially from the artificial intelligence companies who did so. How did it happen? Don't those companies have any natural intelligences working for them?
The mockery about "chart crimes"…nearly overshadowed the technology upgrades announced by two artificial intelligence start-ups. During a demonstration Thursday of ChatGPT's newest version, GPT-5, the company showed a visual in which it appeared 52.8 percent was a larger number than 69.1 percent, which, in turn, was somehow equal to 30.8 percent.
Ironically, at this point the article is interrupted by a Post promotion that reads: "Get concise answers to your questions. Try Ask The Post AI."
… Several more times in the demonstration, ChatGPT parent company OpenAI showed confusing or dubious graphics, including others in which a smaller number appeared visually larger than an actually bigger number…. Conspiracy theories started that AI generated the botched data visuals. (An OpenAI employee apologized for the "unintentional chart crime," and CEO Sam Altman said on Reddit that staff messed up charts in rushing to get their work done. Asked for further comment, OpenAI referred to Altman's Reddit remarks.)
Like the so-called lab leak theory, they aren't "conspiracy theories" but reasonable hypotheses to explain what is otherwise hard to understand. How were such "horrible" charts not only made but shown to the public? The claim by Altman isn't plausible, since the kind of errors made in OpenAI's charts are not the kind made by human beings. Certain types of errors in chartmaking are common to inexperienced people, and some types are common to those with experience and intent to deceive, but these were not of either type. Thus, it seems plausible that they were created using AI, though that doesn't explain why some human being didn't sanity check them. Perhaps the people who work there have too much faith in their own product.
… Also last week, the start-up Anthropic showed two bars comparing the accuracy rates of current and previous generations of its AI chatbot, Claude. …
The y-axis of the bar chart in question1 does not start at zero percent, a common type of graphical distortion often used to exaggerate a difference2. Moreover, there's no indication in the chart itself that it has been truncated so that you have to look at the y-axis scale to discover it. In the rare case when it's permissible to truncate a chart, it's obligatory to include a break in the scale to alert the reader to the truncation3, though this particular chart is not a rare case.
Anthropic has a motive to exaggerate the two percentage point gain in accuracy between Claude Opus 4 and Opus 4.1. This is an all-too-human "error", as opposed to the bizarre ones made by the OpenAI charts. If I find out that Anthropic's bar chart was generated by AI, I'll be more impressed by Claude's ability to imitate humanity than GPT-5's.
Jessica Dai, a PhD student at the University of California at Berkeley's AI research lab, said her big beef with the Anthropic chart was the "hypocrisy," not the off-base scale. The company has previously prodded researchers evaluating AI effectiveness to include what are called confidence intervals, or a range of expected values if a data study is repeated many times.
This is good advice.
Dai wasn't sure that's the right approach but also said that Anthropic didn't even follow its own recommendation. If Anthropic had, Dai said, it might have wiped out statistical evidence of an accuracy difference between old and new versions of Claude. …
Another all-too-human reason for the omission.
[T]o some data experts and AI specialists, the chart crimes are a symptom of an AI industry that regularly wields fuzzy numbers to stoke hype and score bragging points against rivals. …Big technology companies and start-ups love charts that appear to show impressive growth in sales or other business goals but that have no disclosed scale that reveal the numbers behind those graphics. … To the companies, these charts offer a glimpse of their success without overexposing their finances. …
This explanation works for Anthropic's chart but not for those put out by OpenAI. Moreover, it's true of every industry.
By the way, I agree whole-heartedly with the following comment by charting guru Alberto Cairo:
He wasn't irked only about the basic arithmetic abuses. Cairo also was dubious about OpenAI's and Anthropic's use of graphs for two or three numbers that people could understand without any charts. "Sometimes a chart doesn't really add anything," he said. …Cairo pointed to research that may help explain why companies gravitate to charts: They ooze authority and objectivity, and people may be more likely to trust the information.
Pointing to some uncited "research" as support also oozes "authority and objectivity, and people may be more likely to trust the information". Luckily, in this case, common sense and experience support Cairo's claim.
To [Dai] and some other AI specialists with whom I spoke, misguided charts may point to a tendency in the industry to use confidently expressed but unverified data to boast about the technology or bash competitors.The Post previously found that AI detection companies claiming to be up to 99 percent accurate had largely untested capabilities. Meta was mocked this spring for apparently gaming its AI to boost the company's standings in a technology scoreboard. … "Just because you put a number on it, that's supposed to be more rigorous and more real," Dai said. "It's all over this industry."
It's all over all industry.
In recent months, the AI industry has started moving toward so-called simulated reasoning models that use a "chain of thought" [COT] process to work through tricky problems in multiple logical steps. At the same time, recent research has cast doubt on whether those models have even a basic understanding of general logical concepts or an accurate grasp of their own "thought process." Similar research shows that these "reasoning" models can often produce incoherent, logically unsound answers when questions include irrelevant clauses or deviate even slightly from common templates found in their training data.
My experience with testing the ability of the allegedly artificially intelligent chatbots to solve simple logic puzzles is similar4.
In a recent pre-print paper, researchers from the University of Arizona summarize this existing work as "suggest[ing] that LLMs [Large Language Models] are not principled reasoners but rather sophisticated simulators of reasoning-like text." To pull on that thread, the researchers created a carefully controlled LLM environment in an attempt to measure just how well chain-of-thought reasoning works when presented with "out of domain" logical problems that don't match the specific logical patterns found in their training data.
In case you don't know, "pre-print" means that this paper has not been peer-reviewed or published yet, so take it with a dose of salts.
The results suggest that the seemingly large performance leaps made by chain-of-thought models are "largely a brittle mirage" that "become[s] fragile and prone to failure even under moderate distribution shifts," the researchers write. "Rather than demonstrating a true understanding of text, CoT reasoning under task transformations appears to reflect a replication of patterns learned during training." …As the researchers hypothesized, these basic models started to fail catastrophically when asked to generalize novel sets of transformations that were not directly demonstrated in the training data. While the models would often try to generalize new logical rules based on similar patterns in the training data, this would quite often lead to the model laying out "correct reasoning paths, yet incorrect answer[s]." In other cases, the LLM would sometimes stumble onto correct answers paired with "unfaithful reasoning paths" that didn't follow logically.
"Rather than demonstrating a true understanding of text, CoT reasoning under task transformations appears to reflect a replication of patterns learned during training," the researchers write. …
Rather than showing the capability for generalized logical inference, these chain-of-thought models are "a sophisticated form of structured pattern matching" that "degrades significantly" when pushed even slightly outside of its training distribution, the researchers write. Further, the ability of these models to generate "fluent nonsense" creates "a false aura of dependability" that does not stand up to a careful audit.
As such, the researchers warn heavily against "equating [chain-of-thought]-style output with human thinking" especially in "high-stakes domains like medicine, finance, or legal analysis." Current tests and benchmarks should prioritize tasks that fall outside of any training set to probe for these kinds of errors, while future models will need to move beyond "surface-level pattern recognition to exhibit deeper inferential competence," they write.
I'm far from an expert on this kind of AI, but my impression is that it imitates writing about reasoning rather than actually reasoning.
Notes:
Disclaimer: I don't necessarily agree with everything in the above articles, but I think they are worth reading. I have sometimes suppressed paragraphing or rearranged the paragraphs in the excerpts to make a point.
A couple of years ago, Governor Ron DeSantis claimed that crime in his state of Florida was at a fifty-year low while "major" crime in New York City had increased by 23% the previous year1. Now, this is not a fact check but a logic check, so I'm just going to assume that the statistics given by DeSantis and others quoted in this entry are factually correct. Instead of fact-checking these statistics, the question I'm addressing is: What if anything do they prove?
Some critics of DeSantis replied that the homicide rate in Jacksonville, Florida was actually three times greater than that in the Big Apple2: specifically, that the homicide rate per 100K in 2022 was 16.7 in Jacksonville but only 4.8 in New York City. Of course, both of these sets of statistics can be correct: it's quite possible that crime was decreasing in Florida and increasing in New York as DeSantis claimed, but was worse in Florida than in New York as his critics claimed. But even if the statistics are correct, the governor could rightfully be criticized for cherry-picking the ones that made his state look good.
A defender of DeSantis rebutted the critics by citing the number of murders per square mile in 2022 in Jacksonville: 0.19, and New York City: 1.383. This is a statistic of dubious value in comparing the amount of murder in two places since it's affected by population density: the higher the density, the more murders per square mile. New York no doubt has much greater population density than Jacksonville. Moreover, this particular comparison is affected by a piece of trivia appropriate for a Ripley's cartoon4 or the Guinness book of world records.
What is the largest city in area in the contiguous United States, that is, the "lower 48"? This is a trivia question rather than a logic puzzle, so you either know the answer or you can look it up, but you can't figure it out. You might guess that it's Los Angeles, a notoriously spread-out city, but that's wrong. Do you give up? The answer is Jacksonville, Florida5.
So, even if it made sense to compare cities on the basis of murders per mile², it wouldn't be fair to compare New York City to Jacksonville, given that the latter is the largest city in area in the lower forty-eight, but only the eleventh in population size6.
Despite the title, I'm not ready to add an entry to the files for statistical fallacies that take advantage of Jacksonville's trivial status as the lower 48's biggest city in area. However, I've now come across two examples and if I find one more, I may just do so.
Notes:
In previous lessons1, we saw how Venn diagrams are used to represent logical relations between classes. However, as pointed out previously, Venn's diagrams are limited to representing the relations between three classes. There are extensions of Venn's diagrams but they become increasingly awkward with increasing numbers of class terms. When faced with polysyllogisms―that is, categorical arguments involving four or more class terms―one way to work around this problem was explained in Lesson 19, namely, breaking such arguments down into a chain of categorical syllogisms.
As I mentioned in the previous lesson, the technique of turning a complex argument into a chain of simpler ones can show that the argument is valid but not that it's invalid. This is because that technique is a method of proof, and it's a general fact that a given argument's failure to prove its conclusion doesn't mean that no other would do so. In contrast, a Venn diagram either shows an argument valid or invalid. For this reason, it would be nice to have such a diagrammatic technique for polysyllogisms.
Prior to John Venn, Leonhard Euler used circles to represent the logical relationships between classes2. In my opinion, Euler's diagrams for the universal statements of categorical logic are more intuitive than those of Venn, but unfortunately those for the particular statements were neither intuitive nor useful. This problem led Venn to keep the circles but take a different approach to representing all types of categorical statement, which is a shame given the limitations of his approach both in intuitiveness and in number of terms diagrammable.
In this lesson, I will simply introduce Euler's diagrams and show how they are used to represent the logical content of universal statements but, in a future lesson, we'll see how to evaluate categorical arguments.
Euler did not have anything corresponding to Venn's primary diagrams3, which divide up all of the logical space of the diagram into every possible subclass of two or three classes. Instead, of using shading to show that certain classes were empty, Euler used the spatial relationship between the circles themselves to indicate such relationships. So, here's how Euler represented universal affirmative statements―that is, A statements:
Similarly, to represent universal negative statements―that is, E statements―Euler drew the circles so that they did not overlap. In my view, these diagrams are more intuitive representations of these categorical relationships than the corresponding Venn diagrams, since you can see that one class is contained within another or that the two classes are disjoint.
Since Euler's methods of representing particular statements were inadequate and not as intuitive as those for universal ones, we can adopt the convention of placing a mark inside a class or subclass to indicate that it is non-empty.
In the next lesson, we'll start looking at how to use this combination of Euler and Venn diagrams to evaluate categorical arguments.
Notes: