Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
(Related text posted to Twitter; this version is edited and has a more advanced final section.)
Imagine yourself in a box, trying to predict the next word – assign as much probability mass to the next token as possible – for all the text on the Internet.
Koan: Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text? What factors make that task easier, or harder? (If you don’t have an answer, maybe take a minute to generate one, or alternatively, try to predict what I’ll say next; if you do have an answer, take a moment to review it inside your mind, or maybe say the words out loud.)
Consider that somewhere on the internet is probably a list of thruples:
GPT obviously isn’t going to predict that successfully for significantly-sized primes, but it illustrates the basic point:
There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator’s next token.
Indeed, in general, you’ve got to be more intelligent to predict particular X, than to generate realistic X. GPTs are being trained to a much harder task than GANs.
Same spirit:
Consider that some of the text on the Internet isn’t humans casually chatting. It’s the results section of a science paper. It’s news stories that say what happened on a particular day, where maybe no human would be smart enough to predict the next thing that happened in the news story in advance of it happening.
As Ilya Sutskever compactly put it, to learn to predict text, is to learn to predict the causal processes of which the text is a shadow.
Lots of what’s shadowed on the Internet has a *complicated* causal process generating it.
Consider that sometimes human beings, in the course of talking, make errors.
GPTs are not being trained to imitate human error. They’re being trained to *predict* human error.
Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you’ll make.
If you then ask that predictor to become an actress and play the character of you, the actress will guess which errors you’ll make, and play those errors. If the actress guesses correctly, it doesn’t mean the actress is just as error-prone as you.
Consider that a lot of the text on the Internet isn’t extemporaneous speech. It’s text that people crafted over hours or days.
GPT-4 is being asked to predict it in 200 serial steps or however many layers it’s got, just like if a human was extemporizing their immediate thoughts.
A human can write a rap battle in an hour. A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.
Or maybe simplest:
Imagine somebody telling you to make up random words, and you say, “Morvelkainen bloombla ringa mongo.”
Imagine a mind of a level – where, to be clear, I’m not saying GPTs are at this level yet –
Imagine a Mind of a level where it can hear you say ‘morvelkainen blaambla ringa’, and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is ‘mongo’.
The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.
When you’re trying to be human-equivalent at writing text, you can just make up whatever output, and it’s now a human output because you’re human and you chose to output that.
GPT-4 is being asked to predict all that stuff you’re making up. It doesn’t get to make up whatever. It is being asked to model what you were thinking – the thoughts in your mind whose shadow is your text output – so as to assign as much probability as possible to your true next word.
Figuring out that your next utterance is ‘mongo’ is not mostly a question, I’d guess, of that mighty Mind being hammered into the shape of a thing that can simulate arbitrary humans, and then some less intelligent subprocess being responsible for adapting the shape of that Mind to be you exactly, after which it simulates you saying ‘mongo’. Figuring out exactly who’s talking, to that degree, is a hard inference problem which seems like noticeably harder mental work than the part where you just say ‘mongo’.
When you predict how to chip a flint handaxe, you are not mostly a causal process that behaves like a flint handaxe, plus some computationally weaker thing that figures out which flint handaxe to be. It’s not a problem that is best solved by “have the difficult ability to be like any particular flint handaxe, and then easily figure out which flint handaxe to be”.
GPT-4 is still not as smart as a human in many ways, but it’s naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.
And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising – even leaving aside all the ways that gradient descent differs from natural selection – if GPTs ended up thinking the way humans do, in order to solve that problem.
GPTs are not Imitators, nor Simulators, but Predictors.
GPTs are not Imitators, nor Simulators, but Predictors.
I think an issue is that GPT is used to mean two things:
- A predictive model whose output is a probability distribution over token space given its prompt and context
- Any particular techniques/strategies for sampling from the predictive model to generate responses/completions for a given prompt.
[See the Appendix]
The latter kind of GPT, is what I think is rightly called a “Simulator”.
From @janus‘ Simulators (italicised by me):
I use the generic term “simulator” to refer to models trained with predictive loss on a self-supervised dataset, invariant to architecture or data type (natural language, code, pixels, game states, etc). The outer objective of self-supervised learning is Bayes-optimal conditional inference over the prior of the training distribution, which I call the simulation objective, because a conditional model can be used to simulate rollouts which probabilistically obey its learned distribution by iteratively sampling from its posterior (predictions) and updating the condition (prompt). Analogously, a predictive model of physics can be used to compute rollouts of phenomena in simulation. A goal-directed agent
… (read more)
Predictors are (with a sampling loop) simulators! That’s the secret of mind
4Veedrac13d
EY gave a tension, or at least a way in which viewing Simulators as a semantic
primitive, versus an approximate consequence of a predictive model, is
misleading. I’ll try to give it again from another angle.
To give the sort of claim worth objecting to, and I think is an easy trap to get
caught on even though I don’t think the original Simulators post was confused,
here is a quote from that post: “GPT doesn’t seem to care which agent it
simulates, nor if the scene ends and the agent is effectively destroyed.”
Namely, the idea is that a GPT rollout is a stochastic sample of a text
generating source, or possibly a set of them in superposition.
Consider again the task of predicting first a cryptographic hash and then the
text which hashes to it, or rather the general class of algorithms for which the
forward pass (hashing) is tractable for the network and the backwards pass
(breaking the hash) is not, for which predicting cryptographic hashes is a
limiting case.
If a model rollout was primarily trying to be a superposition of one or more
coherent simulations, there is a computationally tractable approach to this
task: internally sample a set of phrases, then compute their hashes, then narrow
down the subset of sampled hashes as the has is sampled from, then output the
prior text.
Instead, a GPT model will produce a random semantically-meaningless hash and
then sample unrelated text. Even if seeded from the algorithm above, backprop
will select away from the superposition and towards the distributional,
predictive model. This holds even in the case where the GPT has an entropy
source that would allow it to be distributionally perfect when rolled out from
the start! Backprop will still say no, your goal is prediction, not simulation.
As EY says, this is not a GAN.
Again, I don’t think the original Simulators post was necessarily confused about
any of this, but I also agree with this post that the terminology is imprecise
and the differences can be important.
1David Johnston13d
I can see why your algorithm is hard for GPT — unless it predicts the follow up
string perfectly, there’s no benefit to hashing correctly — but I don’t see why
it’s impossible. What if it perfectly predicts the follow up?
1Veedrac13d
This is by construction: I am choosing a task for which one direction is
tractable and the other is not. The existence of such tasks follows from
standard cryptographic arguments, the specifics of the limiting case are less
relevant.
If you want to extrapolate to models strong enough to beat SHA256, you have
already conceded EY’s point as this is a superhuman task at least relative to
the generators of the training data, but anyway there will still exist similar
tasks of equal or slightly longer length for which it will hold again because of
basic cryptographic arguments, possibly using a different hashing scheme.
Note that this argument requires the text to have sufficiently high entropy for
the hash to not be predictable a priori.
1David Johnston13d
It’s the final claim I’m disputing – that the hashed text cannot itself be
predicted. There’s still a benefit to going from e.g. 10−20 to 10−10 probability
of a correct hash. It may not be a meaningful difference in practice, but
there’s still a benefit in principle, and in practice it could also just
generalise a strategy it learned for cases with low entropy text.
1Veedrac13d
The mathematical counterpoint is that this again only holds for sufficiently low
entropy completions, which need not be the case, and if you want to make this
argument against computronium suns you run into issues earlier than a reasonably
defined problem statement does.
The practical counterpoint is that from the perspective of a simulator graded by
simulation success, such an improvement might be marginally selected for,
because epsilon is bigger than zero, but from the perspective of the actual
predictive training dynamics, a policy with a success rate that low is
ruthlessly selected against, and the actual policy of selecting the per-token
base rate for the hash dominates, because epsilon is smaller than 1/64.
1David Johnston13d
Are hash characters non uniform? Then I’d agree my point doesn’t stand
1Veedrac13d
They typically are uniform, but I think this feels like not the most useful
place to be arguing minutia, unless you have a cruxy point underneath I’m not
spotting. “The training process for LLMs can optimize for distributional
correctness at the expense of sample plausibility, and are functionally
different to processes like GANs in this regard” is a clarification with
empirically relevant stakes, but I don’t know what the stakes are for this
digression.
2David Johnston13d
I was just trying to clarify the limits of autoregressive vs other learning
methods. Autoregressive learning is at an apparent disadvantage if P(Xt|Xt−1) is
hard to compute and the reverse is easy and low entropy. It can “make up for
this” somewhat if it can do a good job of predicting Xt from Xt−2, but it’s
still at a disadvantage if, for example, that’s relatively high entropy compared
to Xt−1 from Xt. That’s it, I’m satisfied.
While the claim – the task ‘predict next token on the internet’ absolutely does not imply learning it caps at human-level intelligence – is true, some parts of the post and reasoning leading to the claims at the end of the post are confused or wrong.
Let’s start from the end and try to figure out what goes wrong.
GPT-4 is still not as smart as a human in many ways, but it’s naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.
And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising – even leaving aside all the ways that gradient descent differs from natural selection – if GPTs ended up thinking the way humans do, in order to solve that problem.
From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs. Unbounded version of the task is basically of same generality and difficulty as what GPT is doing, and is roughly equivalent to understand everything what is understandable in the observable universe. For example: a friend of mine worked at … (read more)
9Eliezer Yudkowsky13d
I didn’t say that GPT’s task is harder than any possible perspective on a form
of work you could regard a human brain as trying to do; I said that GPT’s task
is harder than being an actual human; in other words, being an actual human is
not enough to solve GPT’s task.
I don’t see how the comparison of hardness of ‘GPT task’ and ‘being an actual human‘ should technically work – to me it mostly seems like a type error.
– The task ‘predict the activation of photoreceptors in human retina‘ clearly has same difficulty as ‘predict next word on the internet’ in the limit. (cf Why Simulator AIs want to be Active Inference AIs)
– Maybe you mean something like task + performance threshold. Here ‘predict the activation of photoreceptors in human retina well enough to be able to function as a typical human’ is clearly less difficult than task + performance threshold ‘predict next word on the internet, almost perfectly’. But this comparison does not seem to be particularly informative.
– Going in this direction we can make comparisons between thresholds closer to reality e.g. ‘predict the activation of photoreceptors in human retina, and do other similar computation well enough to be able to function as a typical human’ vs. ‘predict next word on the internet, at the level of GPT4’ . This seems hard to order – humans are usually able to do the human task and would fail at the GPT4 task at GPT4 level; GPT4 is able to do the GPT4 task and would fail at… (read more)
1viluon6d
I’d really like to see Eliezer engage with this comment, because to me it looks
like the following sentence’s well-foundedness is rightly being questioned.
While I generally agree that powerful optimizers are dangerous, the fact that
the GPT task and the “being an actual human” task are somewhat different has
nothing to do with it.
3Max H12d
Yes, human brains can be regarded as trying to solve the problem of minimizing
prediction error given their own sensory inputs, but no one is trying to push up
the capabilities of an individual human brain as fast as possible to make it
better at actually doing so. Lots of people are definitely trying this for GPTs,
measuring their progress on harder and harder tasks as they do so, some of which
humans already cannot do on their own.
Or, another way of putting it: during training, a GPT is asked to solve a
concrete problem no human is capable of or expected to solve. When GPT fails to
make an accurate prediction, it gets modified into something that might do
better next time. No one performs brain surgery on a human any time they make a
prediction error.
5Jan_Kulveit12d
This seems the same confusion again.
Upon opening your eyes, your visual cortex is asked to solve a concrete problem
no brain is capable or expected to solve perfectly: predict sensory inputs.
When the patterns of firing don’t predict the photoreceptor activations, your
brain gets modified into something else, which may do better next time. Every
time your brain fails to predict it’s visual field, there is a bit of
modification, based on computing what’s locally a good update.
There is no fundamental difference in the nature of the task.
Where the actual difference is are the computational and architectural bounds of
the systems.
The smartness of neither humans nor GPTs is bottlenecked by the difficulty of
the task, and you can not say how smart the systems are by looking at the
problems. To illustrate that fallacy with a very concrete example:
Please do this task: prove P ≠ NP
[https://en.wikipedia.org/wiki/P_versus_NP_problem] in next 5 minutes. You will
get $1M if you do.
Done?
Do you think you have become much smarter mind because of that? I doubt do – but
you were given a very hard task, and a high reward.
The actual strategic difference and what’s scary isn’t the difficulty of the
task, but the fact human brain’s don’t multiple their size every few months.
(edited for clarity)
2Max H11d
No, but I was able to predict my own sensory input pretty well, for those 5
minutes. (I was sitting in a quiet room, mostly pondering how I would respond to
this comment, rather than the actual problem you posed. When I closed my eyes,
the sensory prediction problem got even easier.)
You could probably also train a GPT on sensory inputs (suitably encoded) instead
of text, and get pretty good predictions about future sensory inputs.
Stepping back, the fact that you can draw a high-level analogy between
neuroplasticity in human brains <=> SGD in transformer networks, and sensory
input prediction <=> next token prediction doesn’t mean you can declare there is
“no fundamental difference” in the nature of these things, even if you are
careful to avoid the type error in your last example.
In the limit (maybe) a sufficiently good predictor could perfectly predict both
sensory input and tokens, but the point is that the analogy breaks down in the
ordinary, limited case, on the kinds of concrete tasks that GPTs and humans are
being asked to solve today. There are plenty of text manipulation and
summarization problems that GPT-4 is already superhuman at, and SGD can already
re-weight a transformer network much more than neuroplasticity can reshape a
human brain.
I will try to explain Yann Lecun’s argument against auto-regressive LLMs, which I agree with. The main crux of it is that being extremely superhuman at predicting the next token from the distribution of internet text does not imply the ability to generate sequences of arbitrary length from that distribution.
GPT4’s ability to impressively predict the next token depends very crucially on the tokens in its context window actually belonging to the distribution of internet text written by humans. When you run GPT in sampling mode, every token you sample from it takes it ever so slightly outside the distribution it was trained on. At each new generated token it still assumes that the past 999 tokens were written by humans, but since its actual input was generated partly by itself, as the length of the sequence you wish to predict increases, you take GPT further and further outside of the distribution it knows.
The most salient example of this is when you try to make chatGPT play chess and write chess analysis. At some point, it will make a mistake and write something like “the queen was captured” when in fact the queen was not capture