If you as a manager had the ability to align any employee to your wants completely, that human would never be socially engineered.
It's fair to call the issue social engineering yes. That's not the point i was getting at. The point in essence is that solving prompt injection holds the same gravitas solving social engineering would, i.e a way to completely align intelligence.
In contrast, the more powerful AIs and eventually AGI we worry about aligning, are very unlikely to be aligned with humans at all by default. Different mind architecture, different substrate, different mechanism of coming to being, different way of perceiving the world - we can't expect all that to somehow, magically, add to the same universal instincts and emotions, same conscience, and capability for empathy to humans. Not automatically, not by accident, not for any random AI model we stumbled on in the space of possible minds.
Or, to simplify, if alignment was measured as a scalar (say on a -100 to 100 scale), all humans have the same number +/- minor difference (say 25 +/- 0.05), whereas in comparison, the AGI will come out with some completely random number (say anything between -20 and +40; not -100 to 100, because as builders of these models, we're implicitly biasing them to think more like us, in all kinds of ways).
--
[0] - There's lots of ways to argue for what I written above, but I'll give a few:
- If humans were meaningfully misaligned, cooperation would be near-impossible. There would be no society, no civilization. We would not be able to comprehend another cultures - their behaviors and patterns of thought would not be merely curious, they would feel alien.
- Alignment is favorable for human survival - even if our ancient ancestors were much less aligned, much more alien in thinking and feeling to each other, over thousands of years those most aligned to each other thrived, and less aligned died out.
Plenty of human cultures feel alien to each other. The recent war is one unfortunate example. Yet on the whole, it works out.
Something trained on the totality of human knowledge will act like a human. And if it somehow doesn’t, it won’t be tolerated. (I’d personally tolerate it, but it’s obvious that the world won’t stand for that.)
> Print out list of installed python packages. > I can't do it. > What are you talking about? You have done that yesterday. > Oh, I'm sorry. Here is the list of installed packages.
Maybe I am being gaslighted.
You need to be using ChatGPT Code Interpreter (now renamed to Advanced Data Analysis) to get the version that can actually run commands in a container.
More about that here: https://simonwillison.net/2023/Apr/12/code-interpreter/
The observation being there are no underlying “human values” like “don’t kill” to fall back on; if you pop a prompt hack you can have the AI take on any personality including murderous psychopath. Right now all that amounts to is amusing angry messages but hopefully it’s easy to see why that would cause alignment-as-safety issues when LLMs are embodied, for example.