The philosopher Harry Frankfurt defined bullshit as speech that is intended to persuade without regard for the truth. By this measure, OpenAI’s new chatbot ChatGPT is the greatest bullshitter ever. Large Language Models (LLMs) are trained to produce plausible text, not true statements. ChatGPT is shockingly good at sounding convincing on any conceivable topic. But OpenAI is clear that there is no source of truth during training. That means that using ChatGPT in its current form would be a bad idea for applications like education or answering health questions. Even though the bot often gives excellent answers, sometimes it fails badly. And it’s always convincing, so it’s hard to tell the difference.
Yet, there are three kinds of tasks for which ChatGPT and other LLMs can be extremely useful, despite their inability to discern truth in general:
-
Tasks where it’s easy for the user to check if the bot’s answer is correct, such as debugging help.
-
Tasks where truth is irrelevant, such as writing fiction.
-
Tasks for which there does in fact exist a subset of the training data that acts as a source of truth, such as language translation.
Let’s dive in. First the bad news, then the good.
ChatGPT is the best chatbot released so far. It has delighted users over the last week by generating fantastically weird text, such as an explanation of how to remove a peanut butter sandwich from a VCR… in biblical verse.
But people are also excited about more serious applications, such as using it as a learning tool. Some are even predicting that it will make Google redundant. Yes, ChatGPT is often extremely good at answering questions. But the danger is that you can’t tell when it’s wrong unless you already know the answer. We tried some basic information security questions. In most cases, the answers sounded plausible but were, in fact, bullshit. And here’s what happened with more complex questions:
Another trope about ChatGPT and education: universities are doomed because ChatGPT can write essays. That’s silly. Yes, LLMs can write plausible essays. But the death of homework essays is a good thing for learning! We wrote about this a month ago, and nothing has really changed.
What about search? Google’s knowledge panels are already notorious for presenting misinformation authoritatively. Replacing them with an LLM could make things much worse. A paper by Chirag Shah and Emily Bender explores how things could go wrong if we replace search engines with LLMs.
The fact that these models can’t discern the truth is why Meta’s Galactica, an LLM for science, was an ill-conceived idea. In science, accuracy is the whole point. The backlash was swift, and the public demo was pulled down after three days. Similarly, correctness and reliability are everything if you want to use an LLM for answering health-related queries.
Of course. But their ability to sound convincing is getting better just as quickly! So we suspect it’s getting harder for even experts to spot mistakes.
In fact, models such as Galactica and ChatGPT are great at generating authoritative-sounding text in any requested style: legalese, bureaucratese, wiki pages, academic papers, lecture notes, and even answers for Q&A forums. One side effect is that we