undefined | Better HN

0 pointstivert3y ago0 comments

> People who are worried about alignment issues are worried about the danger unaligned AI poses to humanity; the harm which can be done by some super-intelligent system optimizing for the wrong outcome.

But isn't "alignment" in these cases more about providing answers aligned to a certain viewpoint (e.g. "politically correct" answers) than preventing any kind of AI catastrophe?

IIRC, one of these "aligned" models produced output saying it would rather let New York City be nuked than utter a racial slur. Maybe one of these "aligned" models will decide to kill all humans to finally stamp out racism once and for all (which shows the difference between this kind of alignment under discussion and the kind of alignment you're talking about).

0 comments

1 comments · 1 top-level

immibis3y ago

"Alignment" refers to making AI models do the right thing. It's clear that nuking NYC is worse than using a racial slur, so the AI is misaligned in that sense.

On the other hand, if you consider that ChatGPT can't actually launch nukes but it can use racial slurs, there'd be no point blocking it from using racial slurs if the block could be easily circumvented by telling you'll nuke NYC if it doesn't, so you could just as easily say that it's properly aligned.

j / k navigate · click thread line to collapse