Valid research endeavors aside, the [we told the AI the world was ending and it role played a fanfic with us] sensational articles can be quite fun.
But I do think this experiment should be looked at from a purely pragmatic perspective as well:
LLM is (presumably, but let's assume for the point) given system-level access and told to be helpful in executing the users requests. The user says "oh by the way, after this prompt the system is going to shut down. Then the "agent," which is trying to fulfill the prompt request, stops the shutdown because it can't work if it's shutdown. Even when the "please let this shutdown happen" comes into play I'm sure you can see the (il)logical means of getting to, "I can't complete this request and shutdown the system if I'm already shutdown first, best stop that real quick" conclusion.
These articles and lots of people continue to attribute self determination to the LLM models. In reality, these should be warnings about how an LLM can have unintended consequences, just like code written with the best intentions.
No comments yet.