For the past year or so I’ve been spending most of my time researching the use of language and diffusion models in software businesses.
One of the issues in during this research—one that has perplexed me—has been that many people are convinced that language models, or specifically chat-based language models, are intelligent.
But there isn’t any mechanism inherent in large language models (LLMs) that would seem to enable this and, if real, it would be completely unexplained.
LLMs are not brains and do not meaningfully share any of the mechanisms that animals or people use to reason or think.
LLMs are a mathematical model of language tokens. You give a LLM text, and it will give you a mathematically plausible response to that text.
There is no reason to believe that it thinks or reasons—indeed, every AI researcher and vendor to date has repeatedly emphasised that these models don’t think.
There are two possible explanations for this effect:
- The tech industry has accidentally invented the initial stages a completely new kind of mind, based on completely unknown principles, using completely unknown processes that have no parallel in the biological world.
- The intelligence illusion is in the mind of the user and not in the LLM itself.
Many AI critics, including myself, are firmly in the second camp. It’s why I titled my book on the risks of generative “AI” The Intelligence Illusion.
For the past couple of months, I’ve been working on an idea that I think explains the mechanism of this intelligence illusion.
I now believe that there is even less intelligence and reasoning in these LLMs than I thought before.
Many of the proposed use cases now look like borderline fraudulent pseudoscience to me.
The rise of the mechanical psychic
The intelligence illusion seems to be based on the same mechanism as that of a psychic’s con, often called cold reading. It looks like an accidental automation of the same basic tactic.
By using validation statements, such as sentences that use the Forer effect, the chatbot and the psychic both give the impression of being able to make extremely specific answers, but those answers are in fact statistically generic.
The psychic uses these statements to give the impression of being able to read minds and hear the secrets of the dead.
The chatbot gives the impression of an intelligence that is specifically engaging with you and your work, but that impression is nothing more than a statistical trick.
This idea was first planted in my head when I was going over some of the statements people have been making about the reasoning of these “AI.”
I first thought that these were just classic cases of tech bubble enthusiasm, but no, “AI” has both taken a different crowd and the believers in the “AI” bubble sound very different from those of prior bubbles.
—“This is real. It’s a bit worrying, but it’s real.”
—“There really is something there. Not sure what to think of it, but I’ve experienced it myself.”
—“You need to keep your mind open to the possibilities. Once you do, you’ll see that there’s something to it.”
That’s when I remembered, triggered by a blog post by Terence Eden on the prevalence of Forer statements in chatbot replies. I have heard this before.
This specific blend of awe, disbelief, and dread all sound like the words of a victim of a mentalist scam artist—psychics.
The psychic’s con is a tried and true method for scamming people that has been honed through the ages.
What I describe below is one variation. There are many variations, but the core mechanism remains the same.
The Psychic’s Con
1. The Audience Selects Itself
Most people aren’t interested in psychics or the like, so the initial audience pool is already generally more open-minded and less critical than the population in general.
2. The Scene is Set
The initial audience is prepared. Lights are dimmed. The psychic is hyped up. Staff research the audience on social media or through conversation. The audience’s demographics are noted.
3. Narrowing Down the Demographic
The psychic gauges the information they have on the audience, gestures towards a row or cluster, and makes a statement that sounds specific but is in fact statistically likely for the demographic. Usually at least one person reacts. If not, the psychic will imply that the secret is too embarrassing for the “real” person to come forward, reminds people that they’re available for private readings, and tries again.
4. The Mark is Tested
The reaction indicates that the mark believes they were “read”. This leads to a burst of questions that, again, sound very specific but are actually statistically generic. If the mark doesn’t respond, the psychic declares the initial read a success and tries again.
5. The Subjective Validation Loop
The con begins in earnest. The psychic asks a series of questions that all sound very specific to the mark but are in reality just statistically probable guesses, based on their demographics and prior answers, phrased in a specific, highly confident way.
6. “Wow! That psychic is the real thing!”
The psychic ends the conversation and the mark is left with the sense that the psychic has uncanny powers. But the psychic isn’t the real thing. It’s all a con.
1. Audience selection
Seers, tarot card readers, psychics, mind readers aren’t all con artists. Sometimes the “psychic” is open about it all just being entertainment and aren’t pretending to be able to contact spirits or read minds. Some psychics do not have a profit motive at all, and without the grift it doesn’t seem fair to call somebody a con artist.
But many of them are con artists deliberately fooling people, and they all operate using the same basic mechanisms that begin well before the reading proper.
The audience is usually only composed of those already pre-disposed to believe in psychic phenomena and those they have managed to drag with them. Hardcore sceptics will almost always be in a very small minority of the audience, which both makes them easy to manage and provides social pressure on them to tone down their scepticism.
Those who attend are primed to believe and are already familiar with the mythology surrounding psychics. All of which helps them manage expectations and frame their performance.
2. Setting the scene
Usually the audience is reminded of the ground rules for how psychic readings “work” at the start of the performance. They are helped by the popularisation of these rules by media, cinema, and TV.
Everybody now “knows” that:
- Readings usually begin murky and unclear.
- They then become clearer as the “connection” to the “spirit world” gets stronger.
- Errors are expected. The “spirits” are often vague or hard to hear.
- Non-believers can weaken or even disrupt the connection.
Psychics also habitually research their audience, by mapping out their demographics, looking them up on social media, or even with informal interviews performed by staff mingling with attendees before the performance begins.
When the lights dim, the psychic should have a clear idea of which members of the audience will make for a good mark.
3. Narrowing down
The mark usually chooses themselves. The psychic makes a statement and points towards a row, quickly altering their gesture based on somebody responding visible to the statement. This makes it look like they pointed at the mark right from the beginning.
The mark is that way primed from the start to believe the psychic. They’re off-guard. Usually a bit surprised and totally unprepared for the quick burst of questions the psychic offers next. If those questions land and draw the mark in, they are followed by the actual reading. Otherwise, they move on and try again.
4. Testing the mark—Cold reading using subjective validation
The con—cold reading—hinges on a quirk of human psychology: if we personally relate to a statement, we will generally consider it to be accurate.
This unfortunate side effect of how our mind functions is called subjective validation.
Subjective validation, sometimes called personal validation effect, is a cognitive bias by which people will consider a statement or another piece of information to be correct if it has any personal meaning or significance to them. People whose opinion is affected by subjective validation will perceive two unrelated events (i.e., a coincidence) to be related because their personal beliefs demand that they be related.
As a consequence, many people will interpret even the most generic statement as being specifically about them if they can relate to what was said.
The more eager they are to find meaning in the statement, the stronger the effect.
The more they believe in the speaker’s ability to make accurate statements, the stronger the effect.
The basic mechanism of the psychic’s con is built on the mark being willing and able to relate what was said to themselves, even if it’s unintentional.
5. The subjective validation loop using validation statements
The psychic taps into this cognitive bias by making a series of statements that are tailored to be personally relatable—sound specific to you—while actually being statistically generic.
These statements come in many types. I use “validation statements” here as an umbrella term for all these various tactics.
Some common examples:
- Forer or Barnum statements are probably the most famous kind of statement that plays into the subjective validation effect. Many of these statements are inherently meaningless but are nonetheless felt to be accurate by listeners. Most people will consider “you tend to be hard on yourself” to be an accurate description of themselves, for example.
- Vanishing negative is where a question is rephrased to include a negative such as “not” or “don’t”. If the psychic asks “you don’t play the piano?” then they will be able to reframe the question as accurate after the fact, no matter what the answer is. If you answer negative: “didn’t think so”. Positive: “that’s what I thought.”
- Rainbow ruse where the psychic associates the mark with both a trait and its opposite. “You’re a very calm person, but if provoked you can get very angry.”
- Statistical guesses. Statements like “you have, or used to have, a scar on your left leg or knee” apply to almost everybody. With enough knowledge of common statistics, the psychic can make general statements that sound incredibly specific to the mark.
- Demographic guesses. Similar to statistical guesses, these are statements that are common to a demographic but will sound very specific to the mark that’s listening.
- Unverifiable predictions. Predictions like “somebody bears a strong ill will towards you but they are unlikely to act on it” are impossible to verify, but will sound true to many people.
- Shotgunning is one of the more common tactic where the psychic will fire off a series of statements. The mark will find one of the statements to be accurate and, due to how our minds work, will come away only remembering the correct statement.
An i
26 Comments
zahlman
Original title (too long for submission):
> The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic's con
JKCalhoun
> 1 The tech industry has accidentally invented the initial stages a completely new kind of mind, based on completely unknown principles…
> 2) The intelligence illusion is in the mind of the user and not in the LLM itself.
I've felt as though there is something in between. Maybe:
3) The tech industry invented the initial stages a kind of mind that, though misses the mark, is approaching something not too dissimilar to how an aspect of human intelligence works.
> By using validation statements, … the chatbot and the psychic both give the impression of being able to make extremely specific answers, but those answers are in fact statistically generic.
"Mr. Geller, can you write some Python code for me to convert a 1-bit .bmp file to a hexadecimal string?"
Sorry, even if you think the underlying mechanisms have some sort of analog there's real value in LLM's, not so psychics doing "cold readings".
prideout
I lost interest fairly quickly because the entire article seems to rely on a certain definition of "intelligent" that is not made clear in the beginning.
dist-epoch
Yeah, when I read about AI solving international math olympiad problems it's not intelligence, it's just me projecting my math skills upon the model.
> LLMs are a mathematical model of language tokens. You give a LLM text, and it will give you a mathematically plausible response to that text.
> The tech industry has accidentally invented the initial stages a completely new kind of mind, based on completely unknown principles, using completely unknown processes that have no parallel in the biological world.
Or maybe our mind is based on a bunch of mathematical tricks too.
swaraj
You should try the arc agi puzzles yourself, and then tell me you think these things aren't intelligent
https://arcprize.org/blog/openai-o1-results-arc-prize
I wouldn't say it's full agi or anything yet, but these things can definitely think in a very broad sense of the word
dosinga
This feels rather forced. The article seems to claim both that LLMs don't actually work, it is all an illusion and that of course the LLMs know everything, they stole all our work from the last 20 years by scraping the internet and underpaying people to produce content. If it was a con, it wouldn't have to do that. Or in other words, if you had a psychic who actually memorized all biographies of all people ever, they wouldn't need their cons
EagnaIonat
I was hoping it was talking about how it can resonate with users using those techniques. Or some experiments to prove the point. But it is not even that.
There is nothing of substance in this and it feels like the author has a grudge against LLMs.
pama
This is from 2023 and is clearly dated. It is mildly interesting to notice how quickly things changed since then. Nowadays models can solve original math puzzles much of the time and it is harder to argue they cannot reason when we have access to R1, o1, and o3-mini.
jbay808
I was interested in this question so I trained NanoGPT from scratch to sort lists of random numbers. It didn't take long to succeed with arbitrary reliability, even given only an infinitesimal fraction of the space of random and sorted lists as training data. Since I can evaluate the correctness of a sort arbitrarily, I could be certain that I wasn't projecting my own beliefs onto its response, and reading more into the output than was actually there.
That settled this question for me.
Terr_
There's another illusory effect here: Humans are being encouraged to confuse a fictional character with the real-world "author" system.
I can create a mad-libs program which dynamically reassembles stories involving a kind and compassionate Santa Claus, but that does not mean the program shares those qualities. I have not digitally reified the spirit of Christmas, not even if excited human kids contribute some of the words that shape its direction and clap with glee.
P.S.: This "LLM just makes document bigger" framing is also very useful understanding how prompt injection and hallucinations are constant core behaviors, which we just ignore except when they inconvenience us The assistant-bot in the story can be twisted or vanish so abruptly because it's just something in a digital daydream.
karmakaze
AlphaGo also doesn't reason. That doesn't mean it can't do things that humans do by reasoning. It doesn't make sense to make these comparisons. It's like saying that planes don't really fly because they aren't flapping their wings.
Edit: Don't conflate mechanisms with capabilities.
IshKebab
> But there isn’t any mechanism inherent in large language models (LLMs) that would seem to enable this
Stopped reading here. What is the mechanism in humans that enables intelligence? You don't know? Didn't think so. So how do you know LLMs don't have the required mechanism?
twobitshifter
Lost me here – “LLMs are not brains and do not meaningfully share any of the mechanisms that animals or people use to reason or think.“
“the initial stages a completely new kind of mind, based on completely unknown principles, using completely unknown processes that have no parallel in the biological world.”
We just call it a neural network because we wanted to confuse biology with math for the hell of it?
“There is no reason to believe that it thinks or reasons—indeed, every AI researcher and vendor to date has repeatedly emphasised that these models don’t think.”
I mean just look at the Nobel Prize winners for counter examples to all of this https://www.cnn.com/2024/10/08/science/nobel-prize-physics-h…
I don’t understand the denialism behind replicating minds and thoughts with technology – that had been the entire point from the start.
bloomingkales
Why not make a simpler conclusion? It speaks to humans like people speak to Trump, a narcissist.
Everything has to start with “you are great”.
All humans are huge narcissists and the AI is told so, and acts accordingly.
“Isn’t it weird how the beggar constantly kneels?”
How silly an observation.
olddustytrail
> One of the issues in during this research—one that has perplexed me—has been that many people are convinced that language models, or specifically chat-based language models, are intelligent.
Different people have different definitions of intelligence. Mine doesn't require thinking or any kind of sentience so I can consider LLMs to be intelligent simply because they provide intelligent seeming answers to questions.
If you have a different definition, then of course you will disagree.
It's not rocket science. Just agree on a definition beforehand.
habitue
This kind of "LLMs don't really do anything, it's all a trick" / "they're stochastic parrots" argument was kind of maybe defensible a year and a half ago. At this point, if you're making these arguments you're willfuly ignorant of what is happening.
LLMs write code, today, that works. They solve hard PhD level questions, today.
There is no trick. If anything, it's clear they haven't found a trick and are mostly brute forcing the intelligence they have. They're using unbelievable amounts of compute and are getting close to human level. Clearly humans still have some tricks that LLMs dont have yet, but that doesn't diminish what they can objectively do.
fleshmonad
Unfounded cope. And I know this will get me downvoted, as these arguments seem to be popular among the intellectuals on this glorious page.
The machanism of intelligence is not understood. There isn't even a rigorous definition of what intelligence is. "All it does is combine parts it has seen in its training set to give an answer", well then the magic lies in how it knows what parts to combine, if one wants to go with this argument. Also conveniently, the fact that we have millions of years of evolution behind us, plus exabytes of training data over the years in form of different stimuli since birth gets shoved under the rug. I don't want to say that the conclusion is necessarily wrong, but the argument is always bad. I know it is hard to come to terms with the thought that intelligence may be more fundamental in nature and not exclusively a capability of carbon based life forms.
viach
> 1 The tech industry has accidentally invented the initial stages a completely new kind of mind, based on completely unknown principles…
> 2) The intelligence illusion is in the mind of the user and not in the LLM itself.
3) The intelligence of the users is illusion either?
GuB-42
"Do LLMs think?" is a false problem outside of the field of philosophy.
The real question that gets billions invested is "Is it useful?".
If the "con artist" solves my problem, that's fine by me. It is like having a mentalist tell me "I see that you are having a leaky faucet and I see your future in a hardware store buying a 25mm gasket and teflon tape…". In the end, I will have my leak fixed and that's what I wanted, who care how it got to it?
ripped_britches
This article confuses conscious, felt experience with intelligence.
tomohelix
On a bit of a tangent and hypothetical, but what if we pool eough resources together to do a training that includes everything a human can experience? I am thinking of all the five senses and all the data that comes with it, e.g. books, movies, songs, recitals, landscape, the wind brushing against the "skin", the pain of getting burned, the smell of coffee in the morning, the itchiness of a mosquito's bite, etc.
It is not impossible I think, just require so much effort, talents, and funding that the last thing resembling such an endeavor was the Manhattan project. But if it succeeded, the impact could rival or even exceed what nuclear power had done.
Or am I deluded and there is some sort of fundamental limit or restriction on the transformer that would completely prevent this from the start?
tmnvdb
I'm amazed people are upvoting this piece which does not grapple with any of the real issues in a serious way. I guess some folks just really want AI to go away and are longing to hear that it is just all newfangled nonsense from the city slickers!
cratermoon
I've never gotten a good answer to my question regarding why Open AI chose a chat UI for their gpt, but this article comes closest to explaining it.
orbital-decay
>LLMs <snip> do not meaningfully share any of the mechanisms that animals or people use to reason or think.
This seems to be a hard assumption the entire post, and many other similar ones, rely upon. But how do you know how people think or reason? How do you know human intelligence is not an illusion? Decades of research were unable to answer this. Now when LLMs are everywhere, suddenly everybody is an expert in human thinking with extremely strong opinions. To my vague intuition (based on understanding of how LLMs work) it's absolutely obvious they do share at least some fundamental mechanisms, regardless of vast low-level architecture/training differences. The entire discussion on whether it's real intelligence or not is based on ill-defined terms like "intelligence", so we can keep going in circles with it.
By the way, OpenAI does nothing of this, see [1]:
>artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work
Neither do others. So the author describes "tech industry" unknown to me.
[1] https://openai.com/charter/
xg15
> But that isn’t how language models work. LLMs model the distribution of words and phrases in a language as tokens. Their responses are nothing more than a statistically likely continuation of the prompt.
Not saying the author is wrong in general, but this kind of argument always annoys me. It's effectively a Forer statement for the "sceptics" side: It appears like a full-on refutation, but really says very little. It also evokes certain associations which are plain incorrect.
LLMs are functions that return a probability distribution of the next word given the previous words; this distribution is derived from the training data. That much is true. But this does not tell anything about how the derivation and probability generation processes actually work or how simple or complex they are.
What it does however, is evoke two implicit assumptions without justifying them:
1) LLMs fundamentally cannot have humanlike intelligence, because humans are qualitatively different: An LLM is a mathematical model and a human is, well, a human.
Sounds reasonable until you have a look at the human brain and find that human consciousness and thought too could be represented as nothing more than interactions between neurons. At which point, it gets metaphysical…
2) It implies that because LLMs are "statistical models", they are essentially slightly improved Markov chains. So if an LLM predicts the next word, it would essentially just look up where the previous words appeared in its training data most often and then return the next word from there.
That's not how LLMs work at all. For starters, the most extensive Markov chains have a context length of 3 or 4 words, while LLMs have a context length of many thousand words. Your required amounts of training data would go to "number of atoms in the universe" territory if you wanted to create a Markov chain with comparable context length.
Secondly, as current LLMs are based on the mathematical abstraction of neural networks, the relationship between training data and the eventual model weights/parameters isn't even fully deterministic: The weights are set to initial values based on some process that is independent of the training data – e.g. they are set to random values – and then incrementally adjusted so that the model can increasingly replicate the training data. This means that the "meaning" of individual weights and their relationship to the training data remains very unclear, and there is plenty of space in the model where higher-level "semantic" representations might evolve.
None of that is proof that LLMs have "intelligence", but I think it does show that the question can't be simply dismissed by saying that LLMs are statistical models.
ramesh31
It feels like we are stuck in two different worlds with this stuff right now. One being the AI users that interface solely through app based things like ChatGPT, who have been burned over and over again by hallucinations or lack of context, to the point of disillusionment. The other world is the one where developers who are working with agentic systems built on frontier models right now are literally watching AGI materialize in front of us in real time. I think 2025 will be the year those worlds converge (for the better).