
A Guide to Cutting Through AI Hype by jruohonen
Last Thursday’s Princeton Public Lecture on AI hype began with brief talks based on our respective books:
- AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference. Arvind Narayanan & Sayash Kapoor. Princeton University Press, 2024.
- Artificial Intelligence: A Guide for Thinking Humans. Melanie Mitchell. Farrar, Straus and Giroux, 2019.
The meat of the event was a discussion between the two of us and with the audience. A lightly edited transcript follows.

Photo credit: Floriaan Tasche
AN: You gave the example of ChatGPT being unable to comply with the instruction to draw a room without an elephant in it. It even reassured you that there was no elephant in the image. For anyone wondering, that’s because the chatbot is separate from the image generator. So it literally can’t “see” what it produced. That’s how it’s architected.
I’m curious how easy those problems are going to be to solve. It’s something I think a lot about, especially when I debate with AI boosters who say it will revolutionize the economy. One of the biggest sticking points for me is that these systems can’t see outside the box they’re in. That limits their common sense.
[Note: we mention multimodal image generation a bit later; it is a new technique that doesn’t suffer from the elephant problem. But this discussion is about the lack of context, which is a broader issue.]
Here’s another example. Every time a new chatbot comes out, I try playing rock-paper-scissors with it. I’ll say, “Let’s see what happens if you go first.” If it picks “rock”, I’ll say “paper,” and it replies, “Oh, you win!” Then I say, “Let’s go again,” and whatever it picks, I’ll pick the option that beats it.
After this happens a few times, I ask it, “How do you think I won each time?” And it’ll say something like, “Oh, you must be really good at reading robot minds.” Almost no chatbot so far has understood that turn-taking in an online interface is different from simultaneous rock-paper-scissors in real life. It doesn’t recognize the basic elements of its own user interface.
Is that a big issue? Can it be easily solved? What are the implications?
MM: I think it’s a big issue. Someone once said that current AI systems lack a frontal cortex—the part of the brain responsible for what we call metacognition. That’s a term from psychology. It refers to awareness of one’s own cognition—whether what you’re saying is true or false, how confident you are in it, that kind of thing.
In addition to metacognition, these systems also lack episodic memory—memory of their own experiences. So each conversation is essentially a clean slate unless the system has some way to access a persistent record.
That interaction between memory and metacognition is essential to human intelligence and to avoiding these kinds of problems. It’s something current AI systems don’t have.
AN: That’s really interesting. One thing I’ve noticed in your work—and a big reason I’ve learned so much from you—is that you study the internals of AI and connect them to the internals of human cognition. You look for similarities and differences. It’s deeply satisfying to explore that from a curiosity-driven perspective.
I wonder, though: to what extent do we need to understand the internals to make practical decisions about AI? I’m not an expert on the internals; I study the behavior of AI—its societal and economic impacts. Are these two perspectives complementary? What’s the role of understanding internals when it comes to decision-making around AI?
MM: That’s a really important question. One strange thing about AI is that we built it—we trained it—but we don’t understand how it works. It’s so complex. Even the engineers at OpenAI who made ChatGPT don’t fully understand why it behaves the way it does.
It’s not unlike how we don’t fully understand ourselves. I can’t open up someone’s brain and figure out how they think—it’s just too complex.
When we study human intelligence, we use both psychology—controlled experiments that analyze behavior—and neuroscience, where we stick probes in the brain and try to understand what neurons or groups of neurons are doing.
I think the analogy applies to AI too: some people evaluate AI by looking at behavior, while others “stick probes” into neural networks to try to understand what’s going on internally. These are complementary approaches.
But there are problems with both. With the behavioral approach, we see that these systems pass things like the bar exam or the medical licensing exam—but what does that really tell us?
Unfortunately, passing those exams doesn’t mean the systems can do the other things we’d expect from a human who passed them. So just looking at behavior on tests or benchmarks isn’t always informative. That’s something people in the field have referred to as a crisis of evaluation.
I was actually going to ask you—what do we do about this crisis?
AN: I want to start by reinforcing the point you made. We can’t read much into the fact that a model passed the bar exam or the medical licensing exam. A lawyer’s job isn’t just answering bar exam questions all day.
And when lawyers use these tools for real, non-trivial tasks—as opposed to something like translating documents, which older AI has been able to do for a decade—they run into problems.
Many lawyers have actually been sanctioned by courts for submitting briefs filled with fake, non-existent citations that were generated by chatbots. They didn’t realize these aren’t search engines and don’t have access to a reliable repository of truth.
There used to be a steady stream of media articles about this, but I think the stories have slowed down—not because it stopped happening, but because it’s so commonplace that it’s no longer news.
So, what do we do about the crisis of evaluation? I have both a technical answer and a structural one.
Let me start with the structural point. Right now, the state of evaluation in AI is like the auto industry before independent safety testing. It’s as if car makers were the only ones evaluating their own products—for crash safety, environmental impact, and so on.
It’s like we have no EPA doing independent tests, no Consumer Reports crash testing.
MM: We used to have that situation with cars, right? Didn’t end well.
AN: Exactly. I think we need a robust, independent third-party evaluation system. We—and many others—have been trying to build that. So that’s one structural change that would help: changing how evaluations are done.
The second point is more technical. Historically, AI evaluation has been like a one-dimensional hill-climbing contest. You define a task—say, image classification—and try to keep improving accuracy. A great example is the ImageNet dataset, which came out of Princeton and Stanford. It’s often credited as the root of the modern deep learning revolution.
That approach worked really well for a long time. But it’s very clear that it’s not working anymore, because today’s AI systems aren’t narrowly focused on a single task. They do a broad range of things and need to be accountable to real-world users. So we need to go beyond accuracy and incorporate many other factors.
We have a project called The Science of AI Evaluation—and many others are working in this area too. I think we need to completely reboot our understanding of what it means to evaluate AI systems. I’m cautiously optimistic that we’re getting there.
MM: It’s absolutely essential to figure out, as Turing asked, what is the nature of their intelligence? How much can we trust them?
I wanted to ask you: a month or two ago, Bill Gates was on The Tonight Show and said that within a decade, AI will replace doctors and teachers, and that we won’t need humans for most things. What do you think about that?
AN: In your talk you gave historical examples of overconfidence from AI experts, which I loved. One point you made, which I thought was spot on, is: are these really the right people to be making predictions?
I’ve felt strongly about that. Often, when AI researchers predict that AI will take over some job, the basis for that prediction is an incredibly narrow and shallow understanding of what the job actually involves. A famous example is Geoff Hinton predicting back in 2016 that radiologists were about to be obsolete.
What I’ve found is that people in various professions have a much better understanding of the limits of AI in their domains than AI researchers do. The overconfidence often stems from this process: the researcher defines a one-dimensional benchmark that captures a tiny aspect of the job, sees that AI performance improves rapidly over time on that benchmark, projects it forward, and concludes that AI will surpass humans and take over the job in three years.
But they haven’t thought about the hundred other things involved in the job that are hard for AI but trivial for humans—because those things require context