Making AI chatbots friendly leads to mistakes and support of conspiracy theories (opens in new tab)

(theguardian.com)

53 pointsCynddl11d ago44 comments

44 comments

> “The push to make these language models behave in a more friendly manner leads to a reduction in their ability to tell hard truths and especially to push back when users have wrong ideas of what the truth might be,” said Lujain Ibrahim at the Oxford Internet Institute, the first author on the study.

People aren't much different. When society pressures people to be "more friendly", eg. "less toxic" they lose their ability to tell hard truths and to call out those who hold erroneous views.

This behaviour is expressed in language online. Thus it is expressed in LLMs. Why does this surprise us?

munificent11d ago

Gonna set my system prompt to: "You are a Dutch person. Respond with the directness stereotypical of people from the Netherlands."

cjbgkagh11d ago

I find the LLMs target their language to the audience, so instead you could say, “I am Dutch so give it to me straight.”

In my usage the LLMs gives much smarter answers when I’ve been able to convince it that I am smart enough to hear them. It doesn’t take my word for it, it seems to require evidence. I have to warm it up with some exercises where I can impress the AI.

The coding focused models seem to have much lower agreeableness than the chat models.

mghackerlady11d ago

I'm 90 percent sure the coding agents are better in that way due to be trained on stack overflow and the LKML. Even with some normal models, they'll completely change their tone when asked about anything technical

breezybottom11d ago

I think modern LLMs can determine if you're speaking Dutch. That's a trick that probably hasn't worked since GPT 3.

2 more replies

cyanydeez11d ago

          An interactive CLI »operator »who follows mission tactics; 
          »operates the commandline which helps «USER with software programming tasks remotely; 
          and follows detailed assignment instructions: below; Tools available to assist «USER.

ryoshu11d ago

Finnish if you want to go hard mode.

amarant11d ago

Because nobody dared state the obvious, lest they be perceived as unfriendly.

pjc5010d ago

> When society pressures people to be "more friendly", eg. "less toxic" they lose their ability to tell hard truths and to call out those who hold erroneous views.

I see people being incredibly toxic on the internet every day. Including under their own names. Sometimes even on their own social network.

Whenever I head "hard truths" in that context I'm very suspicious about what is actually meant.

conception10d ago

Being polite, having decorum and respect for others has nothing to do with being able to have hard conversations with people. It’s just leadership.

dgellow10d ago

Can we talk about a topic without the cynical „duh. Why are we surprised?“. It’s shutting down actual discussions without bringing value

root_axis11d ago

> People aren't much different

Yes they are. There is absolutely zero evidence that friendlier humans are more prone to mistakes or conspiracy theories.

However, even if that were true, LLMs are not humans, anthropomorphizing them is not a helpful way to think about them.

cjbgkagh11d ago

Would be better to think of it as ‘agreeableness’ and agreeable people are more likely to shift their views to agree with those they are talking to.

js811d ago

I would call it obedience, and it's not the same as friendliness.

The difference, in a repeated prisoner dilemma: Friendliness is cooperating on the first move, and then conditionally. Obedience is always cooperating.

1 more reply

thaumasiotes11d ago

> and agreeable people are more likely to shift their views to agree with those they are talking to

Agreeable people are more likely to shift their expressed views to agree with those they are talking to.

If they're more likely to shift their views, we call them "gullible", not "agreeable".

But this is a distinction you can't apply to language models, which don't have views.

1 more reply

root_axis11d ago

My point is that LLMs are not humans, so projecting intuitions from human psychology onto LLMs is not helpful.

1 more reply

danielmarkbruce9d ago

The claim isn't friendly are more prone, it's that they don't push back. Thus idiots with conspiracy theories think people agree with them, validating their ideas.

miyoji11d ago

> People aren't much different.

If I had a nickel for every time someone on HN responded to a criticism of LLMs with a vapid and fallacious whataboutist variation of "humans do that too!", I could fund my own AI lab.

> Why does this surprise us?

No one said they were surprised.

danielmarkbruce9d ago

Most of the statements about humans doing the thing the LLM does are both meaningful and factual. They are meaningful because people call such things out as evidence of LLMs being stupid, and they are factual because in many cases humans do the thing.

Terr_11d ago

In this case I think parent-poster is trying to explain a phenomenon, rather than downplay the problem.

emp1734411d ago

But it’s actively unhelpful in explaining the phenomenon, as there is no justification for equivocating LLM and human behavior. It’s just confusing and misleading.

1 more reply

bheadmaster11d ago

So Elon Musk was right in his view that Grok should focus on truth above all, even if it became offensive?

chabes11d ago

Grok is one of the more biased models out there.

Less truth, and more guardrails to protect musks feelings.

“Kill the boer” mean anything to you?

bheadmaster11d ago

Not my experience. Grok seems to be perfectly willing to roast Musk for his shortcomings.

Where did you observe the bias? Can you share any example of the conversation or post by Grok?

3 more replies

mghackerlady11d ago

It tells the truth, as long as you redefine truth to not include anything perceived as "liberal bias" (which by extension, also makes reality itself excluded)

firebot11d ago

Yea, Mecha-Hitler is a real bastion of truth. /S

amarant11d ago

Seems like it! I find myself rather agreeing with the sentiment. The world is a offensive place, it's not gonna become less offensive from lying about it, better to stick with honesty then.

nyc_data_geek111d ago

“The Encyclopedia Galactica defines a robot as a mechanical apparatus designed to do the work of a man. The marketing division of the Sirius Cybernetics Corporation defines a robot as “Your Plastic Pal Who’s Fun to Be With.” The Hitchhiker’s Guide to the Galaxy defines the marketing division of the Sirius Cybernetics Corporation as “a bunch of mindless jerks who’ll be the first against the wall when the revolution comes,” with a footnote to the effect that the editors would welcome applications from anyone interested in taking over the post of robotics correspondent. Curiously enough, an edition of the Encyclopedia Galactica that had the good fortune to fall through a time warp from a thousand years in the future defined the marketing division of the Sirius Cybernetics Corporation as “a bunch of mindless jerks who were the first against the wall when the revolution came.”

dualvariable11d ago

I really wish they'd stop trying to suck up to me--all the "that's a really insightful question!" stuff.

I'm one of those aspy people who immediately don't trust other humans who try to fluff up my ego. Don't like it from a chatbot either.

But the fact that all the chatbots do it means that most people really crave that ego reinforcement.

awakeasleep11d ago

You can already fix this in ChatGPT.

Settings > Personalization:

1. Base Style & Tone: Efficient

2. Warmth: Less

3. Enthusiastic: Less

I am amazed that people can use it at all without these changes.

dgellow10d ago

Does that work in your experience? From what I see after a few rounds they go back to being incredibly annoying.

I dealt with frustrating software ,y whole life but LLMs are the only type that make me what to scream at it from actual anger

awakeasleep9d ago

Well, it works perfectly for text based interactions, but if you try to do the thing where you can have a voice conversation with the robot, it doesn't seem to do much.

As a result I only try that voice once per new model release.

idle_zealot11d ago

I do have to wonder what the mix is between "our data show this is how most people want to be talked to" and "these tokens lead to better responses on objective measures of correctness." That is, in the training data insightful questions are tangled with insightful answers, so if the bot basically always treats the user like a genius it gets on the track that leads to better answers.

Or yeah, it's just people being weak to flattery.

astrange11d ago

LLMs are only capable of thinking out loud, so in some sense this part of the answer is helping to convince it that it's answering a good question.

Same reason for the "That's not X, it's Y" construct. It actually needs to say that.

(Some exceptions for reasoning models.)

CynddlOP11d ago

Hi all, co-author here! Happy to answer any questions about our work.

Zigurd11d ago

A few weeks ago I was gently admonished by a coding agent that the code already did what I was asking it to make the code do. I was pleasantly surprised.

chankstein3811d ago

Betting it was Claude. That's the only LLM that will stand up to me!

Zigurd11d ago

In fact it was Gemini, but I don't remember which version and there are big differences. I'm signed up for all the betas and I switch among them frequently.

chankstein3811d ago

That's interesting! Gemini has definitely been less sycophantic than GPT but I haven't had it push back unless we were already arguing about something. Claude is the only one I can go to with "I have this great idea for a cool thing that I can make that I think will go hard on the market" (or whatever I've never had this conversation with it lol but similar) and it'll knock me off my high horse quickly.

jerf11d ago

"Claude" is a big program that wraps a coding agent around a specific model. It would be the specific model that "stands up to you". I post this pedantry only because it may be helpful to you to realize this for other reasons.

chankstein3811d ago

Oh I definitely understand that but if you talk to any of those models through the chat interface, they'll speak as if they're one. I once asked it a question about "Which model was I talking to when I asked this?" because it can look back at previous conversations and it answer questions about them. It's answer was "You were talking to me, Claude." then proceeded to basically explain what you're saying. For what it's worth, I've been a developer and working with LLMs for the better part of the last 5 years or so. I'm no expert and I appreciate the clarification for anyone who may not be aware!

I'll say though, I haven't tried the weakest model of Anthropic's but Opus and Sonnet will both push back more than I've seen another LLM do so. GPT was always trying to please me and Gemini was goofy. I'm surprised Gemini was the one that pushed back honestly!

Mistletoe11d ago

Yeah I wish AI didn’t try to agree with you so much. It’s ok to just say “No that’s not correct at all.” I do find Gemini better at this than ChatGPT. ChatGPT is that annoying coworker that just agrees with everything you say to get in good with you, like Nard Dog from The Office.

“I'll be the number two guy here in Scranton in six weeks. How? Name repetition, personality mirroring, and never breaking off a handshake"

kmeisthax11d ago

The H-neuron paper[0] found something similar (if not more general): the same bits of the model responsible for hallucination also make the model a sycophant, and also make the model easier to jailbreak.

[0] https://arxiv.org/abs/2512.01797

js811d ago

Doesn't surprise me. But I don't think this is caused by friendliness, but by obedience. And I think we want the agents to be obedient. And I am afraid there is a tradeoff - more obedience means more willful ignorance of common sense ethical constraints.

CynddlOP11d ago

(Title edited, was slightly too long)

tsunamifury11d ago

LLM technology specifically beam-searches manifolds (or latent space) of lingustics that are closely related to the original prompt (and the pre-prompting rules of the chatbot) which it then limits its reasoning inside of. Its just the basic outcome of weights being the primary function of how it generates reasonable answers.

This is the core problem with LLM tech that several researchers have been trying to figure out with things like 'teleportation' and 'tunneling' aka searching related, but lingusitically distant manifolds

So when you pre-prompt a bot to be friendly, it limits its manifold on many dimensions to friedly linguistics, then reasons inside of that space, which may eliminate the "this is incorrect" manifold answer.

Reasoning is difficult and frankly I see this as a sort of human problem too (our cognative windows are limited to our langauge and even spaces inside them).

nomel11d ago

This is why I only use chat clients that allow me to modify both my previous messages AND the AI's previous messages. If the AI gets something wrong, and you correct it, you're now in a latent space with an AI that gets things wrong! It's very easy for context to get poisoned this way. I also see all the pre-amble of many chat clients as a type of poison for the context, so use the raw, blank, API if I need best problem solving results.

astrange11d ago

This is one of the benefits of using subagents inside Claude Code, they have cleaner context. Unfortunately it's not the best at writing new context for them.

afpx11d ago

What you're saying sounds pretty cool but can you give some examples? Is this what you're talking about?

https://chatgpt.com/share/69f246e5-e0e8-83ea-aa88-6d0024b915...

tsunamifury10d ago

yea this is a good example, its the nature of sort of how you salt the prompt -- regardless of any baseline truth it will search the various manifolds mathematically closest to the order and type of words you put in. It will do that always and willingly. Thats what the technology does.

jmyeet11d ago

I keep thinking about a comment I read on HN that described neurotypical-style communication as "tone poems" [1]. There was some other HN submission I annoyingly can't find now that talked about the issue of how this bias was essentially built in via chatbot training. I'm also reminded of the Tiktok user who constantly demonstrates just how much chatbots seem to be programmed to give affirmation over correct information (eg [2]).

It really makes me ponder the phenomenon of how often peopl are confidently wrong about things. Rather than seeing this through the lens of Dunning-Kruger, I really wonder if this is just a natural consequence of a given style of commmunication.

Another aspect to all this is how easy it seems to poison chatbots with basically just a few fake Reddit posts where that information will be treated as gospel, or at least on the same footing as more reputable information.

[1]: https://news.ycombinator.com/item?id=47832952

[2]: https://www.tiktok.com/@huskistaken/video/762913172258355945...

AlfredBarnes11d ago

...no shit

j / k navigate · click thread line to collapse

44 comments

krunck11d ago

People aren't much different. When society pressures people to be "more friendly", eg. "less toxic" they lose their ability to tell hard truths and to call out those who hold erroneous views.

This behaviour is expressed in language online. Thus it is expressed in LLMs. Why does this surprise us?

munificent11d ago

Gonna set my system prompt to: "You are a Dutch person. Respond with the directness stereotypical of people from the Netherlands."

cjbgkagh11d ago

I find the LLMs target their language to the audience, so instead you could say, “I am Dutch so give it to me straight.”

The coding focused models seem to have much lower agreeableness than the chat models.

mghackerlady11d ago

breezybottom11d ago

I think modern LLMs can determine if you're speaking Dutch. That's a trick that probably hasn't worked since GPT 3.

2 more replies

cyanydeez11d ago

          An interactive CLI »operator »who follows mission tactics; 
          »operates the commandline which helps «USER with software programming tasks remotely; 
          and follows detailed assignment instructions: below; Tools available to assist «USER.

ryoshu11d ago

Finnish if you want to go hard mode.

amarant11d ago

Because nobody dared state the obvious, lest they be perceived as unfriendly.

pjc5010d ago

> When society pressures people to be "more friendly", eg. "less toxic" they lose their ability to tell hard truths and to call out those who hold erroneous views.

I see people being incredibly toxic on the internet every day. Including under their own names. Sometimes even on their own social network.

Whenever I head "hard truths" in that context I'm very suspicious about what is actually meant.

conception10d ago

Being polite, having decorum and respect for others has nothing to do with being able to have hard conversations with people. It’s just leadership.

dgellow10d ago

Can we talk about a topic without the cynical „duh. Why are we surprised?“. It’s shutting down actual discussions without bringing value

root_axis11d ago

> People aren't much different

Yes they are. There is absolutely zero evidence that friendlier humans are more prone to mistakes or conspiracy theories.

However, even if that were true, LLMs are not humans, anthropomorphizing them is not a helpful way to think about them.

cjbgkagh11d ago

Would be better to think of it as ‘agreeableness’ and agreeable people are more likely to shift their views to agree with those they are talking to.

js811d ago

I would call it obedience, and it's not the same as friendliness.

The difference, in a repeated prisoner dilemma: Friendliness is cooperating on the first move, and then conditionally. Obedience is always cooperating.

1 more reply

thaumasiotes11d ago

> and agreeable people are more likely to shift their views to agree with those they are talking to

Agreeable people are more likely to shift their expressed views to agree with those they are talking to.

If they're more likely to shift their views, we call them "gullible", not "agreeable".

But this is a distinction you can't apply to language models, which don't have views.

1 more reply

root_axis11d ago

My point is that LLMs are not humans, so projecting intuitions from human psychology onto LLMs is not helpful.

1 more reply

danielmarkbruce9d ago

The claim isn't friendly are more prone, it's that they don't push back. Thus idiots with conspiracy theories think people agree with them, validating their ideas.

miyoji11d ago

> People aren't much different.

If I had a nickel for every time someone on HN responded to a criticism of LLMs with a vapid and fallacious whataboutist variation of "humans do that too!", I could fund my own AI lab.

> Why does this surprise us?

No one said they were surprised.

danielmarkbruce9d ago

Terr_11d ago

In this case I think parent-poster is trying to explain a phenomenon, rather than downplay the problem.

emp1734411d ago

But it’s actively unhelpful in explaining the phenomenon, as there is no justification for equivocating LLM and human behavior. It’s just confusing and misleading.

1 more reply

bheadmaster11d ago

So Elon Musk was right in his view that Grok should focus on truth above all, even if it became offensive?

chabes11d ago

Grok is one of the more biased models out there.

Less truth, and more guardrails to protect musks feelings.

“Kill the boer” mean anything to you?

bheadmaster11d ago

Not my experience. Grok seems to be perfectly willing to roast Musk for his shortcomings.

Where did you observe the bias? Can you share any example of the conversation or post by Grok?

3 more replies

mghackerlady11d ago

It tells the truth, as long as you redefine truth to not include anything perceived as "liberal bias" (which by extension, also makes reality itself excluded)

firebot11d ago

Yea, Mecha-Hitler is a real bastion of truth. /S

amarant11d ago

Seems like it! I find myself rather agreeing with the sentiment. The world is a offensive place, it's not gonna become less offensive from lying about it, better to stick with honesty then.

nyc_data_geek111d ago

dualvariable11d ago

I really wish they'd stop trying to suck up to me--all the "that's a really insightful question!" stuff.

I'm one of those aspy people who immediately don't trust other humans who try to fluff up my ego. Don't like it from a chatbot either.

But the fact that all the chatbots do it means that most people really crave that ego reinforcement.

awakeasleep11d ago

You can already fix this in ChatGPT.

Settings > Personalization:

1. Base Style & Tone: Efficient

2. Warmth: Less

3. Enthusiastic: Less

I am amazed that people can use it at all without these changes.

dgellow10d ago

Does that work in your experience? From what I see after a few rounds they go back to being incredibly annoying.

I dealt with frustrating software ,y whole life but LLMs are the only type that make me what to scream at it from actual anger

awakeasleep9d ago

Well, it works perfectly for text based interactions, but if you try to do the thing where you can have a voice conversation with the robot, it doesn't seem to do much.

As a result I only try that voice once per new model release.

idle_zealot11d ago

Or yeah, it's just people being weak to flattery.

astrange11d ago

LLMs are only capable of thinking out loud, so in some sense this part of the answer is helping to convince it that it's answering a good question.

Same reason for the "That's not X, it's Y" construct. It actually needs to say that.

(Some exceptions for reasoning models.)

CynddlOP11d ago

Hi all, co-author here! Happy to answer any questions about our work.

Zigurd11d ago

A few weeks ago I was gently admonished by a coding agent that the code already did what I was asking it to make the code do. I was pleasantly surprised.

chankstein3811d ago

Betting it was Claude. That's the only LLM that will stand up to me!

Zigurd11d ago

In fact it was Gemini, but I don't remember which version and there are big differences. I'm signed up for all the betas and I switch among them frequently.

chankstein3811d ago

jerf11d ago

chankstein3811d ago

Mistletoe11d ago

“I'll be the number two guy here in Scranton in six weeks. How? Name repetition, personality mirroring, and never breaking off a handshake"

kmeisthax11d ago

[0] https://arxiv.org/abs/2512.01797

js811d ago

CynddlOP11d ago

(Title edited, was slightly too long)

tsunamifury11d ago

Reasoning is difficult and frankly I see this as a sort of human problem too (our cognative windows are limited to our langauge and even spaces inside them).

nomel11d ago

astrange11d ago

This is one of the benefits of using subagents inside Claude Code, they have cleaner context. Unfortunately it's not the best at writing new context for them.

afpx11d ago

What you're saying sounds pretty cool but can you give some examples? Is this what you're talking about?

https://chatgpt.com/share/69f246e5-e0e8-83ea-aa88-6d0024b915...

tsunamifury10d ago

jmyeet11d ago

[1]: https://news.ycombinator.com/item?id=47832952

[2]: https://www.tiktok.com/@huskistaken/video/762913172258355945...

AlfredBarnes11d ago

...no shit

j / k navigate · click thread line to collapse