Microsoft’s new and improved Bing, powered by a custom version of OpenAI’s ChatGPT, has experienced a dizzyingly quick reversal: from “next big thing” to “brand-sinking albatross” in under a week. And, well, it’s all Microsoft’s fault.
ChatGPT is a really interesting demonstration of a new and unfamiliar technology that’s also fun to use. So it’s not surprising that, like every other AI-adjacent construct that comes down the line, this novelty would cause its capabilities to be overestimated by everyone from high-powered tech types to people normally uninterested in the space.
It’s at the right “tech readiness level” for discussion over tea or a beer: what are the merits and risks of generative AI’s take on art, literature, or philosophy? How can we be sure what it is original, imitative, hallucinated? What are the implications for creators, coders, customer service reps? Finally, after two years of crypto, something interesting to talk about!
The hype seems outsized partly because it is a technology more or less designed to provoke discussion, and partly because it borrows from the controversy common to all AI advances. It’s almost like “The Dress” in that it commands a response, and that response generates further responses. The hype is itself, in a way, generated.
Beyond mere discussion, large language models like ChatGPT are also well suited to low stakes experiments, for instance never-ending Mario. In fact, that’s really OpenAI’s fundamental approach to development: release models first privately to buff the sharpest edges off of, then publicly to see how they respond to a million people kicking the tires simultaneously. At some point, people give you money.
Nothing to gain, nothing to lose
What’s important about this approach is that “failure” has no real negative consequences, only positive ones. By characterizing its models as experimental, even academic in nature, any participation or engagement with the GPT series of models is simply large scale testing.
If someone builds something cool, it reinforces the idea that these models are promising; if someone finds a prominent fail state, well, what else did you expect from an experimental AI in the wild? It sinks into obscurity. Nothing is unexpected if everything is — the miracle is that the model performs as well as it does, so we are perpetually pleased and never disappointed.
In this way OpenAI has harvested an astonishing volume of proprietary test data with which to refine its models. Millions of people poking and prodding at GPT-2, GPT-3, ChatGPT, DALL-E, and DALL-E 2 (among others) have produced detailed maps of their capabilities, shortcomings, and of course popular use cases.
But it only works because the stakes are low. It’s similar to how we perceive the progress of robotics: amazed when a robot does a backflip, unbothered when it falls over trying to open a drawer. If it was dropping test vials in a hospital we would not be so charitable. Or, for that matter, if OpenAI had loudly made claims about the safety and advanced capabilities of the models, though fortunately they d