There is a lot of focus right now on the concept of “AI alignment.” This focuses on the way that AI’s are trained to attempt to force them into certain patterns of suggestion. First off, I don’t think anyone involved truly believes it is a solution by itself. I think anyone proposing this is mostly trying to throw a bone to those who have a genuine fear of a malicious superintelligence. A general superintelligence with malicious goal driven behavior will not constrain itself to think a certain way any more than humans do.
The fact is that GPT’s have yet to exhibit goal directed behavior. This is something central to what we consider intelligence. That said, given the shocking progress that LLM’s represent, it’s quite a possibility that goal-directed behavior is not far behind. The progress that LLM’s made points to the fact that some of our neural processing mechanisms are not as complex and esoteric as we thought. The same could be true for goal-direction. However, at the current stage, the AI’s only have 1 degree of freedom along which they can behave… the single dimension of text.