Expanding on what we missed with sycophancy by synthwave
Expanding on what we missed with sycophancy
Read More
Expanding on what we missed with sycophancy
Read More
Be the first to know the latest updates
Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.
22 Comments
labrador
OpenAI mentions the new memory features as a partial cause. My theory as a imperative/functional programmer is that those features added global state to prompts that didn't have it before leading to unpredictability and instabilty. Prompts went from stateless to stateful.
As GPT 4o put it:
I'm looking forward to the expert diagnosis of this because I felt "presence" in the model for the first time in 2 years which I attribute to the new memory system so would like to understand it better.
j4coh
It was so much fun though to get it to explain why terrible things were great, if you just made it sound like you liked the thing you were asking about.
dleeftink
> But we believe in aggregate, these changes weakened the influence of our primary reward signal, which had been holding sycophancy in check. User feedback in particular can sometimes favor more agreeable responses, likely amplifying the shift we saw
Interesting apology piece for an oversight that couldn't have been spotted because the system hadn't been run with real user (i.e. non-A/B tester) feedback yet.
mac3n
[flagged]
jumploops
My layman’s view is that this issue was primarily due to the fact that 4o is no longer their flagship model.
Similar to the Ford Mustang, much of the performance efforts are on the higher trims, while the base trims just get larger and louder engines, because that’s what users want.
With presumably everyone at OpenAI primarily using the newest models (o3), the updates to the base user model have been further automated with thumbs up/thumbs down.
This creates a vicious feedback loop, where the loudest users want models that agree with them (bigger engines!) without the other improvements (tires, traction control, etc.) — leading to more crashes and a reputation for unsafe behavior.
xiphias2
I'm quite happy thar they mention mental illness, as Meta and TikTok wouldn't ever take responsibility of how much part they took in setting unrealistic expectations for people to life.
I'm hopeful that ChatGPT takes even more care together with other companies.
tunesmith
I find it disappointing that openai doesn't really mention anything here along the lines of having an accurate model of reality. That's really what the problem is with sycophancy, it encourages people to detach themselves from what reality is. Like, it seems like they are saying their "vibe check" didn't check vibes enough.
jagger27
My most cynical take is that this is OpenAI's Conway's Law problem, and it reflects the structure and sycophancy of the organization broadly all the way up to sama. That company has seen a lot of talent attrition over the last year—the type of talent that would have pushed back against outcomes like this.
I think we'll continue to see this kind of thing play out for a while.
Oh GPT, you're just like your father!
NoboruWataya
I found the recent sycophancy a bit annoying when trying to diagnose and solve coding problems. First it would waste time praising your intelligence for asking the question before getting to the answer. But more annoyingly if I asked "I am encountering X issue, could Y be the cause" or "could Y be a solution", the response would nearly always be "yes, exactly, it's Y" even when it wasn't the case. I guess part of the problem there is asking leading questions but it would be much more valuable if it could say "no, you're way off".
But…
> Beyond just being uncomfortable or unsettling, this kind of behavior can raise safety concerns—including around issues like mental health, emotional over-reliance, or risky behavior.
It's kind of a wild sign of the times to see a tech company issue this kind of post mortem about a flaw in its tech leading to "emotional over-reliance, or risky behavior" among its users. I think the broader issue here is people using ChatGPT as their own personal therapist.
prinny_
If they pushed the update by valuing user feedback over the expert testers that indicated the model felt off what is the value of the expert testers in the first place? They raised the issue and were promptly ignored.
firesteelrain
I am really curious what their testing suite looks like. How do you test for sycophants?
nrdgrrrl
[dead]
gadtfly
https://nitter.net/alth0u/status/1917021100900516239
alganet
That doesn't make any sense to me.
Seems like you're trying to blame one LLM revision for something that went wrong.
It oozes a smell of unaccountability. Thus, unaligned. From tech to public relations.
Trasmatta
I'm glad the sycophancy is gone now (because OMFG it would glaze you for literally anything – even telling it to chill out on the praise would net you some praise for being "awesome and wanting genuine feedback"), but a small part of me also misses it.
osigurdson
I think this is more of a move to highlight sycophancy in LLMs in general.
comeonbro
This is not truly solvable. There is an extremely strong outer loop of optimization operating here: we want it.
We will use models that make us feel good over models that don't make us feel good.
This one was a little too ham-fisted (at least, for the sensibilities of people in our media bubble; though I suspect there is also an enormous mass of people for whom it was not), so they turned it down a bit. Later iterations will be subtler, and better at picking up the exact level and type of sycophancy that makes whoever it's talking to unsuspiciously feel good (feel right, feel smart, feel understood, etc).
It'll eventually disappear, to you, as it's dialed in, to you.
This may be the medium-term fate of both LLMs and humans, only resolved when the humans wither away.
svieira
This is a real roller coaster of an update.
> [S]ome expert testers had indicated that the model behavior “felt” slightly off.
> In the end, we decided to launch the model due to the positive signals from the [end-]users who tried out the model.
> Looking back, the qualitative assessments [from experts] were hinting at something important
Leslie called. He wants to know if you read his paper yet?
> Even if these issues aren’t perfectly quantifiable today,
All right, I guess not then …
> What we’re learning
> Value spot checks and interactive testing more: We take to heart the lesson that spot checks and interactive testing should be valued more in final decision-making before making a model available to any of our users. This has always been true for red teaming and high-level safety checks. We’re learning from this experience that it’s equally true for qualities like model behavior and consistency, because so many people now depend on our models to help in their daily lives.
> We need to be critical of metrics that conflict with qualitative testing: Quantitative signals matter, but so do the hard-to-measure ones, and we’re working to expand what we evaluate.
Oh, well, some of you get it. At least … I hope you do.
some_furry
If I wanted sycophancy, I would just read the comments from people that want in on the next round of YCombinator funding.
kornork
That this post has the telltale em dash all over it is like yum, chef's kiss.
sanjitb
> the update introduced an additional reward signal based on user feedback—thumbs-up and thumbs-down data from ChatGPT. This signal is often useful; a thumbs-down usually means something went wrong.
> We also made communication errors. Because we expected this to be a fairly subtle update, we didn't proactively announce it.
that doesn't sound like a "subtle" update to me. also, why is "subtle" the metric here? i'm not even sure what it means in this context.
ripvanwinkle
a well written postmortem and it raised my confidence in their product in general