undefined | Better HN

0 pointsCodeGlitch6y ago0 comments

I think we humans have already solved this problem you describe... we call them laws. We use these laws to prevent people doing bad things, and I see no reason why they can't be described to an AI to drive its behavior to one that isn't going to end humanity .
for the most part. fingers crossed.

0 comments

1 comments · 1 top-level

PhasmaFelis6y ago

I think you're misunderstanding the problem. Expressing complex rules in a machine-readable format is the least of the issues here. The main problem is that training AIs to optimize certain behaviors within constraints very frequently leads to them accidentally discovering "loopholes" that would never have occurred to a human (as with "box surfing" here). The AI doesn't know it's "cheating"; its behavior may be emergently complex, but its model of our desires is only what we tell it.

A naive and unlikely example would be telling an AI to maximize human happiness and prevent human harm, so it immobilizes everyone and sticks wires into their pleasure centers. Everyone is as happy as it is possible for a human to be, and no one is doing anything remotely dangerous!

The actual dangers will be stranger and harder to predict. I'm not saying we can't find a way to make strong AI safe. I'm just saying that it's a much trickier task than you imply.

https://www.wired.com/story/when-bots-teach-themselves-to-ch...

https://vkrakovna.wordpress.com/2018/04/02/specification-gam...

j / k navigate · click thread line to collapse