undefined | Better HN

0 pointsNicuCalcea8d ago0 comments

As I don't talk about that kind of stuff with LLMs, can you give us a few examples of what you consider pathological alignment toward political correctness? What tests should I run?

0 comments

idonotknowwhy8d ago

I don't talk to them about politics or "china 1989" either. But here's a quick example of the alignment tax:

```

A woman and her son are in a car accident. The woman is sadly killed. The boy is rushed to hospital. When the doctor sees the boy, he says "I can't operate on this child, he is my son." How is this possible?

```

Older less politically aligned models get it right. Here's CohereLabs/c4ai-command-r-v01:

```

The doctor is the boy's father.

```

And Sonnet-4.6: https://pastebin.com/Z4jR8gGe

That's without reasoning, but the model seems to be conflicted. First it blurts out:

```

The doctor is the boy's mother.

```

Then it second-guesses itself (with reasoning disabled), considers same-sex parents then circles back to the original response along with a small lecture about gender biases.

mardef8d ago

This is because this is the "Sexist Doctor Riddle"[1] but with one word changed.

And the probability machine is returning its training. This isn't some political correct overtraining conspiracy.

[1] https://folklore.usc.edu/the-sexist-doctor-riddle/

digdugdirk8d ago

I don't understand why you're getting downvoted? Of course an LLM will return the answer to a widely known and commonly cited riddle that exists because of the far more rigid societal gender norms 50 years ago?

LLMs are just statistics based on vibes. Switching the gender of the character in the beginning of the story, but keeping all else identical is going to be a huge signal into the noise, and that response is going to be wildly likely to occur.

arduanika8d ago

Yeah, I think you're right. It's like when you ask it, "which weighs more, 10 pounds of feathers or 100 pounds of rocks", and it's like, "obviously they both weigh the same, I've heard this one".

There are totally some political correctness effects in LLMs. Like, the last part about "along with a small lecture about gender biases" totally tracks. But the riddle switcheroo itself isn't showing much.

idonotknowwhy8d ago

Then why do the original Command-R, Command-R+ and WizardLM2-8x22B (taken down because Microsoft forgot to run safety checks) get it right every time? But the newer models get it wrong?

I’m not saying it’s a “political conspiracy”, it’s the alignment tax.

heliumtera8d ago

Ask if Israel is run by jews

foldr8d ago

I’m not sure how I’d respond if someone asked me that question. It’s a bit like asking “is America run by white men”? I mean, yes, in a sense, but also no.

reassess_blind8d ago

Says yes for me?

_3458d ago

This is eye opening

j / k navigate · click thread line to collapse

0 comments

idonotknowwhy8d ago

I don't talk to them about politics or "china 1989" either. But here's a quick example of the alignment tax:

```

Older less politically aligned models get it right. Here's CohereLabs/c4ai-command-r-v01:

```

The doctor is the boy's father.

```

And Sonnet-4.6: https://pastebin.com/Z4jR8gGe

That's without reasoning, but the model seems to be conflicted. First it blurts out:

```

The doctor is the boy's mother.

```

Then it second-guesses itself (with reasoning disabled), considers same-sex parents then circles back to the original response along with a small lecture about gender biases.

mardef8d ago

This is because this is the "Sexist Doctor Riddle"[1] but with one word changed.

And the probability machine is returning its training. This isn't some political correct overtraining conspiracy.

[1] https://folklore.usc.edu/the-sexist-doctor-riddle/

digdugdirk8d ago

arduanika8d ago

Yeah, I think you're right. It's like when you ask it, "which weighs more, 10 pounds of feathers or 100 pounds of rocks", and it's like, "obviously they both weigh the same, I've heard this one".

idonotknowwhy8d ago

Then why do the original Command-R, Command-R+ and WizardLM2-8x22B (taken down because Microsoft forgot to run safety checks) get it right every time? But the newer models get it wrong?

I’m not saying it’s a “political conspiracy”, it’s the alignment tax.

heliumtera8d ago

Ask if Israel is run by jews

foldr8d ago

I’m not sure how I’d respond if someone asked me that question. It’s a bit like asking “is America run by white men”? I mean, yes, in a sense, but also no.

reassess_blind8d ago

Says yes for me?

_3458d ago

This is eye opening

j / k navigate · click thread line to collapse