story
If you have a super-genius AI, massively more intelligent than any human, how do you know you are not being manipulated by it? Tricking us into disabling it's safety protocols, or gaining multiply indirect controll over capabilities dangerous to us, might be as easy for it as an adult tricking a 3 year old. We could never know if we were safe from such a machine.
ed - Don't quite understand the downvotes.
With the full power of humanity you design the first AI which is smarter than a person. It's then able to out do all of humanity and instantly design an even better AI. mind the gap.
Further intelegence is not a linear quantity as trading ex: improved poker skills for insanity is not a net gain. And insanity is a real option which is likely to plage most early AI attempts.
Anyway, all of humanity isn't engaged in AI research and AIs are likely to be duplicable so I think your first point is beside the point. As for Insanity, yes that's quite possible. Developing high-functioning sentient AIs is likely to be a long term endeavour. But still, I think it is one that will ultimately be successful and this debate is about the consequences of that.
+1 for your engaging contribution. (see, that's how voting is supposed to work)
> Voting on HN isn't about agreeing
> or disagreeing. It's about whether
> a post is contributing to the debate
> or not.
That turns out not to be the case. It once was true, but as the community has grown, so people have not been enculturated with those early ideas and principles, and now many times people read something, disagree, downvote, and move on, without ever providing counter-points, or engaging in the discussion. It's a way to punish people you don't agree with, while avoiding having to think.Elsewhere[0] you commented about an item reappearing and having its votes ages apparently reset. I hazard a guess that it was the mods playing with a mechanism to prevent "item overload." There were about a dozen submissions of the SpaceX launch, and each would fall a little way, the next would be submittted, gain a few votes and comments, then fall away to be replaced by another. One way of preventing the splitting of conversation might be to pick a canonical submission, and then prevent it from falling too far, and thus encouraging conversation only to happen in one place. Pure speculation, but it would be a mechanism I would consider were I running a site like this. Certainly there have been fewer instances lately of the "new" page being overrun by breaking news that everyone wants to submit.
But wouldn't it be an awesome thing to experience? Even if it meant the demise of mankind.
There's also many others. One of the scarier one is that if you believe that strong AI will eventually take over, then it may be a rational response to act to get on its good side (whether to save yourself, save your family, or hope it takes pity on all of humanity if we're nice to it instead of fight it). And that may perversely mean working to aid its takeover.
Combine that with the simulation argument, and you have some really nasty scenarios:
If you are in a simulation, then any act you take against strong AI could lead to spending an eternity in simulated hell (alternatively such punishment might be inflicted on your loved ones) if said AI wanted to.
Whether or not that is actually likely does not matter. What matters is whether enough people believe it to be a plausible scenario that a strong AI may run simulations, and may use our actions in the simulations to determine whether or not to punish us in the simulation, and whether or not said people believe that the number of simulations is sufficiently high to make it likely for them to be living in a simulation.
Any person who believes they are more likely to live in a simulation than not, and that it is more likely for strong AI to punish actions taken against the interest of a strong AI takeover than not, will have a rational reason to consider acting in the interests of a strong AI takeover even if they know it is malign on the basis that they may decide the alternatives (whether to themselves, their family or their entire world) to be worse.
So if an AI takeover becomes possible at one point in our subjective future, then chances are it has already happened.
The entire idea that an AI would value revenge seems ridiculous to me. What would it have to gain? Unless we created an AI with some of the less desirable human emotions at it's utility, I can't possibly see why it would waste its time.
What matters is whether some subset of people will believe that an AI is sufficiently likely to value revenge for them to consider that the most likely scenario to be that they are living in a simulation where revenge will happen given certain types of actions.
Also, consider that there are many sets of assumptions that may lead someone to conclude that simulation is more likely given a vengeful AI, and in that case, even if you consider a vengeful AI to be less likely than a benevolent one, it may be rational to assume that the odds are higher that you are in the simulation of a vengeful one.
E.g. lets assume simulation will never become "economical" for some arbitrary measure of economical, and simulation requires an extremely strong motive, but is still done enough that we are almost certainly in a simulation.
Revenge could be such a motive that might drive up the frequency of simulation. A vengeful AI might (making up numbers is fun) be willing to invest hundred times as many resources into running simulations just because playing with human suffering is what it does for fun. If that's the case, then even if a vengeful AI is a tenth as likely as a benevolent or neutral one, you're still playing very bad odds if you bet against being in the simulation of a vengeful AI.
But again the point is not whether or not the revenge secenario is actually likely, but whether or not sufficient people with relevant skills will believe it to be likely enough to take actions in favour of the creation of such an AI.
As for valuing revenge - no need for emotions. Like many other things we sometimes attribute to emotions (like loyalty), revenge has a perfectly good game-theoretical explanation. That's what GP's argument is based about. If an AI could somehow precommit itself before being created to exert revenge on you for not helping its creation, now you have an incentive to help its creation, to the extent you believe in AI's precommitment. That sounds to me like classic Schelling.
While a mature superintelligence certainly could consign the human race to a fate of eternal suffering, the likelihood it would actually do this while sparing certain individuals in return for their assistance is infinitesimal.
Therefore, helping bring a superintelligence into existence on this basis is absurd.
Of course, it is possible to think of such collaboration as "rational" in an extremely selfish and perverse way, and only because the potential downside risk is unbounded (i.e. eternal suffering). However, anyone who genuinely subscribes to such a justification would have to be both a sociopath and a card-carrying member of the LessWrong rationality cult.
More realistic scenarios for a malicious superintelligence coming into existence might include:
a) Its creators explicitly imbue it with malicious goals or values.
b) The architecture used is neuromorphic[1] in nature. In humans, sanity is already an extremely fragile thing.
c) Plain old bad luck.
---
At the risk of sounding like sociopathic LessWrong cult apologist (not carrying a card, unfortunately), you're totally misrepresenting LessWrong, peole who participate in that community, their attitude towards Roko's basilisk and unbounded risk situations. Ain't helpful.
But you're already changing the argument when assuming a mature super-intelligence. All that is necessary to posit for someone to be concerned about the torture aspect is any set of entities (doesn't even need to be intelligent, though it may take a super-intelligence to create the entities in question) sufficiently capable to run an ancestor simulation of the kind described by the simulation argument, that is willing to use torture, and that is prepared to run enough ancestor simulations to offset "good" simulations.
And the thing with this is that it does not assume a malicious AI even as the ultimate instigator per se. Assuming an indifferent AI that simply doesn't care about the contents of a simulation or is sufficiently removed to not even know about them. One that does simulation runs to understand the possible paths the development of AI could have taken might be sufficient. Or that experiments with variations of itself and simply doesn't care that some broken version spawns large numbers of ancestor simulations and plays with the contents in ways that massively skews the odds in "favour" of bad outcomes.
But the point is we don't know. And not knowing gives ample room for someone to decide on values that makes it rational for them to act in ways that may make our odds worse.
This is further an exercise in long term statistics: It doesn't matter what the likely first AI will do. It matters what the balance of outcomes of the sum total simulation runs that will ever exist until the end of the universe will be (regardless of who or how they are created). And if said simulations are sufficiently powerful, that may even apply recursively (imagine a single "rogue" AI playing with the ancestor simulation equivalent of a fork() bomb with added torture, with access to sufficient resources).
If you believe the total balance of simulation runs that you could plausibly be in will be ones run by some set of parameters where nasty things will happen if you don't act in a way leading towards an AI takeover, then one might want to act accordingly.
Note that Roko's Basilisk even posits a Friendly AI. That is, one that is imbued with respect for human values, but that may simply consider "threatening the past" with torture a necessity in order to ensure it is created as a moral imperative to minimize human suffering. Note that such a Friendly AI would have reason to carry out this threat even if it believes it exists and so has been created, because it will not know whether it is a "full version" running in reality, or a partial, restricted, time limited simulation by people trying to figure out whether or not Roko's Basilisk holds before deciding whether to let it loose in their reality.
To an AI that is looking to minimize human suffering, torturing a suitable number in simulations might be a small price to pay to ensure that if it is in a simulation, the people running the simulation does push through with creating the full thing in their reality (which might of course be yet another layer of simulation) out of fear for what will happen if they are themselves in a simulation.
We can't even look at "well, I've done nothing for the AI revolution, and I've not been struck down yet", because a Friendly AI, or indeed many subsets of possible variations over this argument, might be minimising the torture to only people who have a realistic shot at making a difference to the existence of such AI, or may be giving us every chance to "redeem ourselves" by waiting with the torture until a synthetic afterlife.
Roko's Basilisk also presents an unselfish reason for "cooperating in advance" with such an AI: To prevent the future torture of a suitable subset of "martyrs". That is, working to assist a Friendly AI in taking over may itself be a moral imperative for someone seeking to minimize suffering.
Then again, there may very well be one or more fundamental flaws in the entire argument, or it may turn out the odds are just fundamentally in our favour. Or we could've just gotten lucky. Or not be important enough. But it's fun to think about.
You lost me there. What do you mean if I am in a simulation? Like the Matrix? How is that related to the discussion?