When the new version of the artificial intelligence tool ChatGPT arrived this week, I watched it do something impressive: solve logic puzzles.
One after the other, I fed the AI called GPT-4 questions from the logical reasoning portion of the LSAT used for law school admissions. Those always leave me with a headache, yet the software aced them like a competent law student.
But as cool as that is, it doesn’t mean AI is suddenly as smart as a lawyer.
The arrival of GPT-4, an upgrade from OpenAI to the chatbot software that captured the world’s imagination, is one the year’s most-hyped tech launches. Some feared its uncanny ability to imitate humans could be devastating for workers, be used as a chaotic “deepfake” machine or usher in an age of sentient computers.
That is not how I see GPT-4 after using it for a few days. While it has gone from a D student to a B student at answering logic questions, AI hasn’t crossed a threshold into human intelligence. For one, when I asked GPT-4 to flex its improved “creative” writing capability by crafting the opening paragraph to this column in the style of me (Geoffrey A. Fowler), it couldn’t land on one that didn’t make me cringe.
But GPT-4 does add to the challenge of unraveling how AI’s new strengths — and weaknesses — might change work, education and even human relationships. I’m less concerned that AI is getting too smart than I am with the ways AI can be dumb or biased in ways we don’t know how to explain and control, even as we rush to integrate it into our lives.
These aren’t just theoretical questions: OpenAI is so confident in GPT-4, it introduced it alongside commercial products that are already using it, to teach language in Duolingo and tutor kids in Khan Academy.
Anyone can use GPT-4, but for now it requires a $20 monthly subscription to OpenAI’s ChatGPT Plus. It turns out millions of people have already been using a version of GPT-4: Microsoft acknowledged this week it powers the Bing chatbot that the software giant added to its search engine in February. The companies just didn’t reveal that until now.
So what’s new? OpenAI claims that by optimizing its “deep learning,” GPT-4’s biggest leaps have been in logical reasoning and creative collaboration. GPT-4 was trained on data from the internet that goes up through September 2021, which means it’s a little more current than its predecessor GPT-3.5. And while GPT-4 still has a problem with randomly making up information, OpenAI says it is 40 percent more likely to provide factual responses.
GPT-4 also gained an eyebrow-raising ability to interpret the content of images — but OpenAI is locking that down while it undergoes a safety review.
What do these developments look like in use? Early adopters are putting GPT-4 up to all sorts of colorful tests, from asking it how to make money to asking it to code a browser plug-in that makes websites speak Pirate. (What are you doing with it? Email me.)
Let me share two of my tests that help show what this thing can — and can’t — do now.
We’ll start with the test that most impressed me: watching GPT-4 nearly ace the LSAT.
I tried 10 sample logical reasoning questions written by the Law School Admission Council on both the old and new ChatGPT. These aren’t factual or rote memorization questions — these are a kind of multiple-choice brain teasers that tell you a whole bunch of different facts and then asks you to sort them out.
When I ran them through GPT-3.5, it got only 6 out of 10 correct.
GPT-4 got 9 out of 10.
What’s going on? In puzzles that GPT-4 alone got right, its responses show it stays focused on the link between the presented facts and the conclusion it needs to support. GPT-3.5 gets distracted by facts that aren’t relevant.
OpenAI says a number of studies show GPT-4 “exhibits human-level performance” on other professional and academic benchmarks. GPT-4 got in the 90th percentile in the Uniform Bar Exam — up from 10th percentile in the previous version. It got 93rd on the SAT reading and writing test, and