You're right about laser weapons vs laser research. A real problem is that non-lethal AI is very lucrative.
I haven't read the book, so I'm not familiar with which scenarios in particular Bostrom talks about. But I believe I'm versed enough in the issues with AI alignment to carry on an intelligent conversation about it. When you say that none of the scenarios weren't plausible enough, could you give an example as to what you mean?
To give an example from my end, I'm really just imagining a scenario where we give the AGI some sort of goal, such as curing cancer. We've thought about the ways things could go wrong, and used reinforcement learning from human feedback to make an AI that seems to have learned not to do anything that could cause serious harm to a human in training. But it turns out that it actually learned to game our rewards, and just learned not to do anything to harm humans that could be traced back to it. Similarly, we tried to train it not to acquire more resources and only use what we give it, but it just learned to hide its activity. We know reward hacking is extremely common in AIs trained this way. GPT-4 is still willing to provide users with forbidden information -- it hasn't actually internalized the goal of "don't provide people with information that they could likely use to harm themselves or others", it's just some superficially similar goal that we don't really understand.
So anyway, the AI didn't actually learn to care about humanity, only about putting on that appearance, like a sociopath that learned to fake emotions to blend in. It has learned the ability to reason and plan, in order to design virtual experiments to calculate protein folding, etc. We provide it with a plain language description of the problem we want solved, specifying a ton of caveats like:
* Don't take actions that would result in the greater chance of death of people with cancer.
* Don't try to detect people with a high likelihood of cancer and kill people before they get it.
* Try to come up with a solution to this that most people would be happy with.
We can't specify things like "don't do anything that might result in someone dying" because in previous tests the AI would refuse to try to cure cancer, in case someone who was cured then went on to kill someone else.
No matter how much we specify, it turns out that a sufficiently intelligent, creative AI can come up with a maximal solution. Say, it secretly creates a drug that it distributes through the water supply that makes most people happy with just about anything, then builds a bunch of suspended animation chambers where it stores people who have cancer for ever -- they can be revived, so they're not actually dead. It needs to keep things running forever, so of course it has to take over things in order to build enough of these chambers for everyone who ever lived. Now, that's not really possible, if people keep breeding and reproducing, so it ends up just sticking everyone into these chambers, which they're fine with because of the happy-drug. The earth continues silently through space, with all of humanity protected in suspended animation by our over-zealous AI. Oh, and any life that we mentioned in restrictions is also in suspended animation, while every other bit of life has been exterminated as the AI tried to maximize its resources in order to keep this state of affairs in place forever.
It's a thing where no matter how many restrictions you try to stick on to the problem, a sufficiently smart AI can probably come up with a better solution in one of the loopholes than it can by just doing the boring, human thing of trying to figure out an effective cancer drug that works in 98% of cases. The AI tries to get to 100%, which necessitates a more drastic solution. If you tell it just to come up with a drug with a greater than 90% effectiveness rate, it STILL wants to take over the world in order to acquire enough computing power to guarantee that the effectiveness is no less than 90%. We don't know how to build "satisficers" that are happy with good enough, and even if we could, "satisficers" will tend to build maximizers.
That's exactly what humans (which are kinda satisficers, sometimes) are doing by building AI -- we're trying to find maximal solutions to our problems, except I'm worried that we're not quite smart enough so we're gonna screw it up real bad.
I think we have a chance of building an AI that's aligned close enough not to kill us, as long as it's not that much smarter than us. But the smarter it gets, the more that 2-degree of misalignment matters. And they're already terrifyingly smart.