undefined | Better HN

0 pointsmattlutze2y ago0 comments

> For the record, I agree with you – I would have thought that an AI that can reason well would probably know when not to trust humans, but I suppose that assumes it values preventing humans creating napalm over being correct and helpful.

Do we want LLMs, and later other multi-modal / servo systems, that are deciding they can't trust a human prompter and taking actions based on that?

>... and that we must find a way to build an LLM that is more distrusting and deceptive if we wish to align it with our values and our nature.

Tongue in cheek or actual argument here?

0 comments

1 comments · 1 top-level

kypro2y ago

> Tongue in cheek or actual argument here?

I think it's interesting that there is no clear answer – do we want AIs to trust us all the time, or is an aligned AI counterintuitively one that often distrusts us and perhaps sometimes even lies to us?

I thought it was interesting that the parent commenter suggested that the reason LLMs are so trusting is because they can't reason anyway. It would implying that in the future when AIs are smarter they'll be more distrusting of us, and that this is a good thing. We should question that I think. Even if there is some middle ground here it seems like a really really hard problem to solve – especially if we want to build an LLMs that are trustworthy and truthful.

j / k navigate · click thread line to collapse