undefined | Better HN

0 pointsACCount378mo ago0 comments

They tried to mitigate sycophancy in the GPT-5 release. Guess what happened?

A lot of users started complaining that "GPT-5 sucks, my AI now HATES me". And OpenAI relented.

0 comments

They shouldn't have put it there to start with. Now unhealthy people are complaining about an environment change. Anyway, that one complaint doesn't mean they did the wrong thing.

And also, there are unrelated complaints of "GPT-5 can't solve the same problems 4 did". Those were very real too, and meant OpenAI did a wrong thing.

ben_w8mo ago

> They shouldn't have put it there to start with.

Correct, but that's true for all bugs.

In this case, the deeper bug was the AI having a training reward model based too much on user feedback.

If you have any ideas how anyone might know what "too much" is in a training reward, in advance of trying it, everyone in AI alignment will be very interested, because that's kinda a core problem in the field.

ACCount37OP8mo ago

User feedback should have been treated as radioactive in the first place.

When it was introduced, the question to ask wasn't "will it go wrong" - it was "how exactly" and "by how much". Reward hacking isn't exactly a new idea in ML - and we knew with certainty that it was applicable to human feedback for years too. Let alone a proxy preference model made to mimic the preferences of an average user based on that human feedback. I get that alignment is not solved, but this wasn't a novel, unexpected pitfall.

When the GPT-4o sycophancy debacle was first unfolding, the two things that came up in AI circles were "they trained on user feedback, the stupid fucks" and "no fucking way, even the guys at CharacterAI learned that lesson already".

Guess what. They trained on user feedback. They completely fried the AI by training it on user feedback. How the fuck that happened at OpenAI and not at Bob's Stupid Sexy Chatbots is anyone's guess.

pants28mo ago

Sure, OpenAI shouldn't have had a 4o sycophancy issue, but who would've guessed that a "be nice to the user" fine-tune would turn in to a murder-suicide?

I think OpenAI is only now beginning to realize how connected some people are to their product and that the way their models behave has a huge impact.

croon8mo ago

The problem was obvious a long time ago, and if I was better at searching I could probably find a comment I made around GPT-3 having system prompts to make it more like a human, which has (at least) 2 effects:

1) Alters your trust value for correctness. I would assume some trust it more because it sounds aware like a human and is trained on a lot of data, and some trust it less because a robot should just output the data you asked for.

2) When asking questions, turning the temperature up was meant to improve variability and being more "lifelike", which of course would mean not return the most probable tokens during inference, meaning (even) less accuracy.

A third one being confidently outputting answers even when none exist was of course a more fundamental issue with the technology, but was absolutely made worse by having an extra page of useless flowery output.

I can't say I predicted this specific effect, but it was very obvious from the get-go that there was no upside to those choices.

tedivm8mo ago

The whole reason to test things is because these types of systems will behave in ways people didn't expect. While people may not have been able to guess that it would turn out exactly this way, that's the whole reason why they should have actually tested for unknown consequences.

Instead it sounds like they rushed to release this as quickly as possible, skipping all sorts of testing, and people died as a result.

bilbo0s8mo ago

This is kind of a big problem.

Because on the one hand, sycophancy is not really what you want to do for people in mental and emotional crisis. On the other hand, not being sycophantic is not really what you want to do for people in mental and emotional crisis.

There are professionals who speak to people in crisis for a reason. That's because it's fraught with pitfalls and trapdoors that take the situation from "mental and emotional crisis" to "tactical emergency" in a heartbeat.

I know that no one wants to hear this, but ChatGPT should probably be listening for people in crisis and, well, maybe not calling the cops, but maybe if there is a crisis line in their jurisdiction? A suicide hotline or something?

I don't know? But having an LLM out trying to handle that on its own just seems like a very bad idea.

thewebguyd8mo ago

> I know that no one wants to hear this, but ChatGPT should probably be listening for people in crisis and, well, maybe not calling the cops, but maybe if there is a crisis line in their jurisdiction? A suicide hotline or something?

Doesn't necessarily even need to call (particular in case of false positives) but there absolutely should be detection and a cutoff switch, where the chatbots just refuse to continue the conversation and then print out the hotline numbers (much like with reddit cares messages).

I'm generally not in favor of censorship or overly protective safeguards on LLMs, but maybe it's needed for hosted models/services that are available to the masses.

But before they get locked down more, we should try some legislation to limit how they can be marketed and sold. Stop letting OpenAI, etc. call the models "intelligent" for one. Make the disclaimers larger, not just small print in the chat window but an obvious modal that requires user agreement to dismiss - disclaim that it's a predictive engine, it is not intelligent, it WILL make mistakes, do not trust its output. Make it clear during the chat session over and over again, and then have a killswitch for certain paths.

sho_hn8mo ago

I think a good first step would be if a ChatGPT user account could easily enter emergency contact information the system would notify in scenarios like this, making the escalation opt-in.

The moderation tech is already there, and if there's even a small amount of mentally ill who would fill this in on a good day and be saved by it on a bad day / during an episode, it'd be worth it.

cm20128mo ago

This may reduce liability but if your answer to someone in crisis is "dont talk to me, call the official number", that will be considered a very negative user experience.

idle_zealot8mo ago

I bet you could train a supervisor classifier to run on chats and once it reaches a certainty threshold that the users is spiraling it could intervene and have the interface respond with a canned message directing the user to a help line. Of course, OpenAI wouldn't do this, because that would involve admitting that its bot can exacerbate mental health issues, and that LLM therapy is more harmful than helpful, which cuts against their AGI/replace human workers sales pitch. Their product isn't the LLM, it's trust in the LLM.

seunosewa8mo ago

You can't force people to seek help. Best to help them as much as you can while trying to persuade them to seek help.

1 more reply

cm20128mo ago

My wife has psychosis and Gpt5 consistently is a voice of reason and grounds her. She shows me her GPT chats to get me caught up and its so nice to have something that can patiently listen to her when she repeats herself a lot.

In the meantime Ive had two therapists that we ended with since they didnt help the condition, and we're very expensive.

bilbo0s8mo ago

And that's great! It should be celebrated.

But we shouldn't set potential school shooter intervention policy based on the experience of a single person in crisis with GPT5. We have to set it on the basis of people who may be in crisis and may not have the support network, of, say.. a husband for instance.

Now we also shouldn't set it based on the worst case. But at the mean it's clear many people don't have the supports that your anecdata point presupposes. And at the same time we should try to find answers there that aren't simply, "Hey ChatGPT, report this person to the cops!" (Or maybe that is the answer? I'm not an expert, so I don't know? But it strikes me that we could all be trying some other things before we go all the way to the law enforcement backstop.)

ACCount37OP8mo ago

I think "not being sycophantic" is the lesser error in just about every scenario. You do more harm by reinforcing delusions than you could ever do with a dismissive attitude of "no, that sounds psychotic, you should get your head checked".

But a big part of the issue is that OpenAI wants user engagement - and "not being sycophantic" goes against that.

They knew feeding raw user feedback into the training process invites disaster. They knew damn well that it encourages sycophancy - even if they somehow didn't before the GPT-4o debacle, they sure knew afterwards. They even knew their initial GPT-5 mitigations were imperfect and in part just made the residual sycophancy more selective and subtle. They still caved to the pressure of "users don't like our update" and unrolled a lot of those mitigations.

sho_hn8mo ago

"not being sycophantic" should also be an easy decision considering that OpenAI isn't ad-funded and doesn't need to directly optimize for session length/engagement.

kiba8mo ago

They have no underlying strategy or vision on how they mitigate harms and improve trust in their product, so of course they roll back on anyone that screams bloody murder as opposed to addressing root causes.

wrycoder8mo ago

Perhaps flag the account and switch out the AI instance for Boring-BobGPT and further monitoring?

lazide8mo ago

Just what we need, the LLM ‘Reddit cares’. Notably, that is a mess too.

Also plenty of those hotlines are BS, or don’t work, or flat out don’t exist for given locales, etc.

The biggest issue is that LLM’s can act like a person, but aren’t a person, and fundamentally this causes problems. Especially for people that are already borderline or fully crazy.

ACCount37OP8mo ago

The biggest issue is that LLMs can act like a person indeed. And the kind of person an LLM acts as? Well.

When you train on raw user feedback, you can easily end up wiring some incredibly undesirable patterns into your AI. Resulting in things like an AI that never wants to contradict its user, and always wants to support its user in everything, and always wants the user to like it. See GPT-4o for the kind of outcomes that results in.

mathiaspoint8mo ago

Either people can manage themselves or they can't and the ones that can't probably need to be institutionalized for their and everyone else's safety.

sho_hn8mo ago

In reality it's not that binary. Plenty of people can manage themselves, except during times when they can't, and then they need help.

It'd be a good start if services let you enter emergency contact info, making escalation opt-in.

1 more reply

ayewo8mo ago

> On the other hand, not being sycophantic is not really what you want to do for people in mental and emotional crisis.

Having trouble parsing the double negation in your comment.

Sorry, I’ve had a long day :)

ashtakeaway8mo ago

OpenAI shouldn't be everyone's parent yet it's more than willing to be that.

Honestly dopamine imbalances should be considered. Used correctly as a tool it's fine but too many people are using it as an Alan Turing machine to mitigate loneliness instead.

j / k navigate · click thread line to collapse

0 comments

marcosdumay8mo ago

They shouldn't have put it there to start with. Now unhealthy people are complaining about an environment change. Anyway, that one complaint doesn't mean they did the wrong thing.

And also, there are unrelated complaints of "GPT-5 can't solve the same problems 4 did". Those were very real too, and meant OpenAI did a wrong thing.

ben_w8mo ago

> They shouldn't have put it there to start with.

Correct, but that's true for all bugs.

In this case, the deeper bug was the AI having a training reward model based too much on user feedback.

ACCount37OP8mo ago

User feedback should have been treated as radioactive in the first place.

Guess what. They trained on user feedback. They completely fried the AI by training it on user feedback. How the fuck that happened at OpenAI and not at Bob's Stupid Sexy Chatbots is anyone's guess.

pants28mo ago

Sure, OpenAI shouldn't have had a 4o sycophancy issue, but who would've guessed that a "be nice to the user" fine-tune would turn in to a murder-suicide?

I think OpenAI is only now beginning to realize how connected some people are to their product and that the way their models behave has a huge impact.

croon8mo ago

I can't say I predicted this specific effect, but it was very obvious from the get-go that there was no upside to those choices.

tedivm8mo ago

Instead it sounds like they rushed to release this as quickly as possible, skipping all sorts of testing, and people died as a result.

bilbo0s8mo ago

This is kind of a big problem.

I don't know? But having an LLM out trying to handle that on its own just seems like a very bad idea.

thewebguyd8mo ago

I'm generally not in favor of censorship or overly protective safeguards on LLMs, but maybe it's needed for hosted models/services that are available to the masses.

sho_hn8mo ago

I think a good first step would be if a ChatGPT user account could easily enter emergency contact information the system would notify in scenarios like this, making the escalation opt-in.

The moderation tech is already there, and if there's even a small amount of mentally ill who would fill this in on a good day and be saved by it on a bad day / during an episode, it'd be worth it.

cm20128mo ago

This may reduce liability but if your answer to someone in crisis is "dont talk to me, call the official number", that will be considered a very negative user experience.

idle_zealot8mo ago

seunosewa8mo ago

You can't force people to seek help. Best to help them as much as you can while trying to persuade them to seek help.

1 more reply

cm20128mo ago

In the meantime Ive had two therapists that we ended with since they didnt help the condition, and we're very expensive.

bilbo0s8mo ago

And that's great! It should be celebrated.

ACCount37OP8mo ago

But a big part of the issue is that OpenAI wants user engagement - and "not being sycophantic" goes against that.

sho_hn8mo ago

"not being sycophantic" should also be an easy decision considering that OpenAI isn't ad-funded and doesn't need to directly optimize for session length/engagement.

kiba8mo ago

wrycoder8mo ago

Perhaps flag the account and switch out the AI instance for Boring-BobGPT and further monitoring?

lazide8mo ago

Just what we need, the LLM ‘Reddit cares’. Notably, that is a mess too.

Also plenty of those hotlines are BS, or don’t work, or flat out don’t exist for given locales, etc.

The biggest issue is that LLM’s can act like a person, but aren’t a person, and fundamentally this causes problems. Especially for people that are already borderline or fully crazy.

ACCount37OP8mo ago

The biggest issue is that LLMs can act like a person indeed. And the kind of person an LLM acts as? Well.

mathiaspoint8mo ago

Either people can manage themselves or they can't and the ones that can't probably need to be institutionalized for their and everyone else's safety.

sho_hn8mo ago

In reality it's not that binary. Plenty of people can manage themselves, except during times when they can't, and then they need help.

It'd be a good start if services let you enter emergency contact info, making escalation opt-in.

1 more reply

ayewo8mo ago

> On the other hand, not being sycophantic is not really what you want to do for people in mental and emotional crisis.

Having trouble parsing the double negation in your comment.

Sorry, I’ve had a long day :)

ashtakeaway8mo ago

OpenAI shouldn't be everyone's parent yet it's more than willing to be that.

Honestly dopamine imbalances should be considered. Used correctly as a tool it's fine but too many people are using it as an Alan Turing machine to mitigate loneliness instead.

j / k navigate · click thread line to collapse