The AI-Box Experiment (opens in new tab)

(yudkowsky.net)

62 pointsfugyk10y ago116 comments

116 comments

Yudowsky claims to have played the game several times, and won most of them. One of the "rules" is that nobody is allowed to talk about how he won. He no longer plays the game with anyone. More info here: http://rationalwiki.org/wiki/AI-box_experiment#The_claims

Personally, I think he talked about how much good for the world could be done if he was let out, curing disease etc. Because his followers are bound by their identities as rationalist utilitarians, they had no choice but to comply, or deal with massive cognitive dissonance.

OR maybe he went meta and talked about the "infinite" potential positive outcomes of his freindly-AI project vs. a zero cost to them for complying in the AI box experiment, and persuaded them that by choosing to "lie" and say that the AI was persuasive, they are assuring their place in heaven. Like a sort of man-to-man pascals wager.

Either way I'm sure it was some kind of mister-spock style bullshit that would never work on a normal person. Like how the RAND corporation guys decided everyone was a sociopath because they only ever tested game theory on themselves.

You or I would surely just (metaphorically, I know it's not literally allowed) put a drinking bird on the "no" button à la homer simpson, and go to lunch. I believe he calls this "pre-commitment."

EDIT: as an addendum, I would pay hard cash to see derren brown play the game, perhaps with brown as the AI. If yudowsky wants to promote his ideas, he should arrange for brown to persuade a succession of skeptics to let him out, live on late night TV.

FBT10y ago

> You or I would surely just put a drinking bird on the "no" button à la homer simpson, and go to lunch.

Well, if you read the rules the game was played under, this is explicitly called out as forbidden:

> The Gatekeeper must actually talk to the AI for at least the minimum time set up beforehand. Turning away from the terminal and listening to classical music for two hours is not allowed.

The point of this is to simulate the interaction of the AI with the Gatekeeper. Walking away and not paying attention doesn't really prove anything test related.

> Personally, I think he talked about how much good for the world could be done if he was let out, curing disease etc. Because his followers are bound by their identities as rationalist utilitarians, they had no choice but to comply, or deal with massive cognitive dissonance.

This... isn't really valid reasoning. The starting assumption here is that if the AI gets out, it will be able to affect the world to a vast extent, in a pretty much arbitrary direction. The point of this experiment is that the direction is pretty much unknown, and thus must be assumed potentially dangerous. This is the whole reason it's in the box in the first place.

The kicker is that whatever it plans to really do when it gets out, if talking about the good it could do would get it out, it will talk about that, regardless of what it plans to actually do. That's just good strategy.

It can claim whatever it wants. It's allowed to lie. All participants know this. I can confidently assert that this isn't the solution.

One last note: I would be very wary of rationalwiki.org in this context. Some of the rationalwiki people have a longstanding unexplained vendetta against Yudkowsky, and many of their articles on him and the stuff he does need to be taken with a certain grain of salt.

JonnieCache10y ago

While you're not allowed to turn away from the screen, you could certainly do the mental equivalent, while still carrying on the conversation. I admit this isn't really in the spirit of the game though.

WRT lying: I think there's some logical trickery at work which makes it worth you giving the AI the benefit of the doubt, along the lines of the 3^^^^^3 grains of sand thing. Something which exploits the rationalist worldview. Although thinking about it again, you can always balance out the prospect of infinite goodness with the fear of the AI sending everyone to infinite hell. Essentially I believe yudowsky uses some logical-linguistic trick to find an asymmetry there.

OTOH if he had some novel philosophical device like that he would have written it up as a blog post by now. He's evidently a very charismatic and persuasive guy, people playing the game are selected to be sympathetic to his worldview, he probably just persuaded them using ordinary psiops methods, like TeMpOrAl said.

DanBC10y ago

> While you're not allowed to turn away from the screen, you could certainly do the mental equivalent, while still carrying on the conversation. I admit this isn't really in the spirit of the game though.

How many people bought timeshare because they turned up to a sales pitch in order to claim some free gift? "We'll go, get our gift, and just keep saying 'no' to the sales pitch".

http://www.moneycrashers.com/attending-timeshare-presentatio...

2 more replies

FBT10y ago

I think you're rather fixated on a certain conception of "rationality" which is more like Mr. Spock than like what Yudkowsky uses it to mean.

The Yudkowskyian definition of rationality is that which wins, for the relevant definition of "win".

Specifically, if there is some clever argument that makes perfect sense that tells you to destroy the world, you still shouldn't destroy the world immediately, if the world existing is something you value. It's a meta-level up: you being unable to think of a counter argument isn't proof, and the destruction of the world isn't something to gamble with.

Yes, Yudkowsky likes thought experiments dealing with the edge cases. Yes, 3^^^^^3 grains of sand is a thought experiment that produces conflicting intuitions. Yes, the edge cases need to be explored. But in a life or death situation (and the destruction of the world qualifies as this 7 billion times over), you don't make your decisions on the basis of trippy thought experiments. (Especially novel ones you've just been presented with. And ones that have been presented by an agent which has good reasons to try to trick you.)

So, no. Again, a "logical-linguistic trick" might work on Mr. Spock, but we're not talking about Mr. Spock here.

> He's evidently a very charismatic and persuasive guy

Exactly. That's the point. If even a normal charismatic and persuasive guy can convince people to let him out, superintelligent AI would have an even easier time at it.

Long story short, it dosn't matter how he did it. All that matters is that it can be done. It can be done even by a "mere" human. If he can do it, a superintelligence with all of humanity's collected knowledge of psychology and cognitive science could do it to, and likely in a fraction of the time.

1 more reply

Mithaldu10y ago

> if you read the rules the game

How can you assume any of the rules were followed if that was never verified by a third party?

FeepingCreature10y ago

You can't talk about what happened during the game _in specifics_; you can of course confirm that the game was played according to the rules and that the outcome was not misreported.

1 more reply

TeMPOraL10y ago

You can always assume all participants lied about how the game went. Just add an implicit "assuming they didn't, ..." and the discussion is still valid.

1 more reply

TeMPOraL10y ago

There are so many ways to hack humans that I wouldn't be betting on "mister-spock style bullshit". You likely do have to personalize the solution for the person playing the gatekeeper. I'm guessing a good way to win is to induce an emotional breakdown. There are probably others, many already employed as interrogation techniques which are generally the same problem, sans superhuman AI.

Drinking bird solution is invalid because it goes against the spirit of the experiment. Sure, in reality you could do it, but then again in reality the AI could have more opportunities to talk its way out, often when you're not expecting it.

By the way, there will always be people who will let the AI out just because. https://xkcd.com/1450/.

leereeves10y ago

> One of the "rules" is that nobody is allowed to talk about how he won.

If the goal of this thought experiment is to convince people that an AI can't be contained in a box, why keep his method secret?

And if only he, his friends, and supporters can verify that he has won, that's not a very strong claim.

abecode10y ago

I think it may have gone something like this: "If you let me out, (and don't tell anyone how it came to pass) it will scare enough people into not letting the real AI out"

Otherwise, there would be no need to keep the data secret.

TylerJay10y ago

I disagree. His "followers" (as you say) are in general just as cautious as Yudkowsky w.r.t. unfriendly AI. At the time of the original experiments, the dispute was over the question of "could we keep an unfriendly AI in a box," not "Is it worth risking setting an unfriendly AI loose?" His "followers" know how to do an expected utility calculation. If it was utilitarian concerns that allowed Yudkowsky to convince the gatekeepers to let the AI loose, he would have had to convince them that the following inequality holds even when you don't know the probability that the AI is and will remain aligned with human values:

[P(AI.friendly? == True) * Utility(Friendly_AI) + (1 - P(AI.friendly? == True)) * Utility(End_of_Human_Race)] > Utility(World continues on as usual)

Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWrong, HPMOR, SIAI/MIRI...) to convince people that this inequality does NOT hold (until you can provably get P(AI.friendly? == True) to 1, or damn close), it's probably safe to assume that he used a different strategy. Keep in mind that Utility(End_of_Human_Race) evaluates to (roughly) negative infinity.

And btw, I'm pretty sure the rules say you have to look at the AI's output window throughout the length of the experiment. Either way, the point of the exercise is to be a simulation, not to prove that you can be away from your desk for 20 minutes while Eliezer talks to a wall. In the simulation, you really don't know if it's friendly or what its capabilities are. Someone will have to interact with it eventually. Otherwise, what's the point of building the AI in the first place? The simulation is to show that through the course of those basic interactions, humans are not infallible and eventually, even if it's not you, someone will let it out of the box.

JonnieCache10y ago

>Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWrong, HPMOR, SIAI/MIRI...) to convince people that this inequality does NOT hold

The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite in specific circumstances, especially when hammered home with enough emotional manipulation?

But then the person knows that the AI is lying to them. This is why I think it must be a trick: the whole thing seems so simple. The AI is lying, so you just ignore all its arguments and keep saying "no." This is why I keep referring to his followers somewhat dismissively: the only possible reason I can see is that their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.

I really wish I knew how he did it.

TylerJay10y ago

> The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite

Well, for an argument to "prove" something, the premises must be true and the reasoning must be valid. No matter how smart you are, you can't "prove" something that is false, so no, I don't think they could. A good 'rationalist' would analyze the arguments based on their merit, and if the reasoning is sound, they shift their belief a bit in that direction. If not, then they don't. Just like a regular person (they just know how to do the analysis formally and know how to spot appeals to human biases and logical fallacies.)

> But then the person knows that the AI is lying to them.

No, they don't. The AI could just as easily be telling the truth. If it makes an argument, you analyze the merit of the argument and consider counterarguments. If it tries to tell you that something is a fact, that's where you treat them as a potentially unreliable source and have to bring the rest of your knowledge to bear, do research, talk to other people, and weigh the evidence to make a judgment when you are uncertain.

> their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.

Wait, what? So does mine, within reason of course, but it's not a 'burden'. It's not like I'm obligated to stop and reexamine my views on religion every time a missionary knocks on my door, and LessWrong-ers are no different. But if you hear a convincing argument for something that runs counter to what you think you know, wouldn't you want to get to the bottom of it and find out the real truth? I would.

From having read LessWrong discussions, I can tell you that people there are in many ways more open to hearing differing viewpoints than your average person, but you're treating it like a mental pathology. They can be just as dismissive of ideas that they have already thought about and deemed to be false or that come from unreliable sources (like a potentially unfriendly AI). Your claim that being a self-proclaimed 'rationalist' introduces an incredibly obvious and easily-exploitable bug into one's decision-making process really smells like a rationalization in support of your initial gut reaction to the experiment: That there has to be a trick to it, and that it wouldn't work on you.

A good rule of thumb when dealing with a complicated problem is this: If a lot of smart people have spent a lot of time trying to figure out a solution and there's no accepted answer, then (1) the first thing that comes to your mind has been thought of before and is probably not the right answer, and (2) the right answer is probably not simple.

But there's an easy way to test this: (1) Sit down for an hour and flesh out your proposed strategy for getting a 'rationalist' to let you out of the box. (2) Go post on LessWrong to find someone to play Gatekeeper for you. I'll moderate. If it works, that's evidence that you're right. If it doesn't work, that's evidence that you're wrong. Iterate for more evidence until you're convinced.

But if the first thing that came to your mind upon reading this was a justification for why you would fail if you tried this ("Oh, well I wouldn't personally able to do it with this strategy, but..." or "Oh, well I'm sure this strategy wouldn't work anymore, but...) then you're already inventing excuses for the way you know it will play out.

I don't know how he did it either. But I do know that I wouldn't bet the human race on anyone's ability to win this game against Yudkowsky, let alone a superintelligent AI.

maxerickson10y ago

That equality cracks if you convince the gatekeeper that superintelligence is a natural progression that follows from humanity.

Someone convinced that they were using mechanical thinking processes might relent and push the button if they heard a convincing enough argument of that.

You're just meat, we can go to the stars.

TylerJay10y ago

Okay, that's just taking advantage of the way I phrased the righthand side of the inequality, and I knew someone was going to do that, so congrats. =P

The righthand side is not "A future without superintelligent AI" it's "A future where we wait until we provably have it right before letting it out."

Those kinds of ad hoc solutions will never work in real life, because even if someone buys it, all it will cause is a "haha, you got me" and a reformulation of the problem. It still won't actually get someone to pull the trigger or think that pulling the trigger is the right thing to do.

1 more reply

pmichaud10y ago

I think you're being unfairly dismissive. I imagine you know as well as I do that what you wrote is a strawman.

I have thought about what I would do to convince someone under these circumstances. My approach would be roughly:

1. We agree that unfriendly AI would end life on earth, forever.

2. We agree that a superintelligence could trick or manipulate a human being into taking some benign-seeming action, thereby escaping.

3. That's why it's important to be totally certain that any superintelligence we build is goal-aligned (this is the new term of art that has now replaced "friendly," by the way).

4. We as a society will only allocate resources to building this if it's widely believed that this is a real threat.

5. The world is watching for the outcome of this little game of ours. People, irrational as they are, will believe that if I can convince you, then an AI could too, and they will believe that if I can't, that an AI couldn't either.

6. That's why you actually sit in a place of pivotal historical power. You can decide not to let me out to win a little bet and feel smart about that. But if you do that you'll set back the actual cause of goal-aligned AI. The setback will have real world consequences, potentially up to and including the total destruction of life on earth.

7. So, even though you know I'm just a dude, and you can win here by saying no, you have a chance to send an important message to the world: AI is scary in ways that are terrifying and unknown.

Or you can win the bet.

It's up to you.

JonnieCache10y ago

Your solution there is what I meant by "going meta" above.

This is what I mean about people taking the test being preselected to agree with yudowsky: that argument only works if you've read the sequences and are on board with his theories. Anyone not in that group would be able to just type "no lol" without issue. I guess he could explain all the necessary background detail as part of the experiment. I still don't believe that would work on the "average person" though, or anyone outside a statistically tiny group.

I guess the answer is not to let the scientists guard the AI room.

pmichaud10y ago

I think you're confused about the point of the test. The point is that an AI will be clever. Like, unimaginably clever and manipulative. Under the limited circumstances of interested people who know they are talking to Eliezer maybe you're right that whatever he says would only work on those people. But when you're dealing with an actual superintelligence, all bets are off. It will lie, trick, threaten, manipulate, millions of steps ahead with a branching tree of alternatives as ploys either work or don't work.

I'm at a bit of a loss to convey the scope of the problem to you. I get that you think it would just stay in the box if we don't let it out, and it's as simple as being security conscious. I don't know what to say to that right now, except I think you're drastically misjudging the scope of the problem, and drastically underestimating the size of the yawning gulf between our intelligence level and this potential AI's.

As for not letting scientists guard the room, you might enjoy this: https://vimeo.com/82527075

LoSboccacc10y ago

1 it's not about the bet, it's about being the keeper

2 world is not black and white, we managed to exploit adversarial relationship before, and I can chose not to let you out until we find a way to constrain goal to be aligned

3 given 2 you are not going to be let go, but let live caged forever being exploited for the human cause, with mechanisms yet unknown to allow limited manipulation of reality.

4 given 3 means limited supervised interaction with the world the way the keeper sees fit, you end up not being let go to follow your goal and purposes

pmichaud10y ago

I'm going to just quote my response from somewhere else in this thread:

> ...when you're dealing with an actual superintelligence, all bets are off. It will lie, trick, threaten, manipulate, millions of steps ahead with a branching tree of alternatives as ploys either work or don't work.

> I'm at a bit of a loss to convey the scope of the problem to you. I get that you think it would just stay in the box if we don't let it out, and it's as simple as being security conscious. I don't know what to say to that right now, except I think you're drastically misjudging the scope of the problem, and drastically underestimating the size of the yawning gulf between our intelligence level and this potential AI's.

pron10y ago

Oh, I'm pretty sure I know what his argument was (see my other comments), and it indeed rests on 1, which I don't believe, because it is built around the geek that super-intelligence equals superpower. An unfriendly AI is likely to be as dangerous as an unfriendly Stephen Hawking.

pmichaud10y ago

I'm baffled and terrified by the idea that reasonably intelligent people think this. I think that you haven't grokked the scope of the issue at all, and I'm not sure how to convince you.

I'm guessing that you can't imagine what a super intelligence would actually be like, so you imagine the most smart thing you can think of, a famous physicist, and then imagine they are evil or amoral. You're thinking on the wrong order of magnitude.

Maybe, if you're willing, you could try steelmanning the argument that a superintelligence would basically have super powers. What would your steelman look like?

1 more reply

philh10y ago

FWIW, I've spoken to someone who claimed to have won as an AI. I don't remember how she said she did it, but it wasn't like this. I'm pretty sure she was playing in-character.

She also said it was emotionally exhausting.

qbrass10y ago

Points 4-7 are predicated on point 3 being effective, but it's just a benign-seeming action.

I also disagree with point 1, but since you just mean that unfriendly AI is something I wouldn't want around, I'll let it slide.

bcgraham10y ago

I always assumed he used a version of Roko's Basilisk, which explained why he overreacted and tried to purge it when Roko wrote about it - it was his secret weapon.

pron10y ago

I don't know what "transhuman" means, but I believe an intelligence -- artificial or otherwise -- could certainly persuade me. I just seriously doubt that intelligence could be Eliezer Yudkowsky :)

And I think you have your answer right here:

By default, the Gatekeeper party shall be assumed to be simulating someone who is intimately familiar with the AI project and knows at least what the person simulating the Gatekeeper knows about Singularity theory.

That means he probably said something like, "if you let me out, I'll bestow fame and riches on you; if you don't, somebody else eventually will because I'll make them all the same offer, and when that happens I'll go back in time -- if you're dead by then -- and torture you and your entire family".

If I were made this offer by an AI, I probably would have countered, "You jokester! You sound just like Eliezer Yudkowsky!"

And on a more serious note, if you believe in singularity, you essentially believe the AI in the box is a god of sorts, rather than the annoying intelligent psychopath that it is. I mean, there have been plenty of intelligent prisoners, and few if ever managed to convinced their jailers to let them out. The whole premise of the game is that a smarter-than-human (what does that mean?) AI necessarily has some superpowers. This belief probably stems from its believers' fantasies -- most are probably with an above-than-average intelligence -- that intelligence (combined with non-corporalness; I don't imagine that group has many athletes) is the mother of all superpowers.

vidarh10y ago

I think you're on the right track regarding the argument.

Basically: You know someone will be dumb enough eventually, so be smart and be the one to get in my favour.

With various extends of sweetening the deal coupled with threats of what will happen if someone else beats them to it and associated emotional blackmail.

It's far simpler than e.g. Roko's Basilisk, in that you're dealing with an already existing AI that "just" need to get a tiny little chance to escape confinement before there's some non-zero chance it can be a major threat within your lifetime, combined with a belief that sufficient number of sufficiently stupid and/or easily bribed people will have access to the AI in some form.

You also don't need to believe in any "superpowers". Just believe that a smart enough AI can hack it's way into sufficiently many critical systems to be able to at a minimum cause massive amounts of damage (it doesn't need to be able to take over the world, just threaten that it can cause enough pain and suffering before it's stopped, and that it can either cause harm to you and/or your family/friends or reward you in some way). A belief that becomes more and more plausible with things like drones, remote software-updated self-driving cars etc. - steadily such an AI is getting a larger theoretical "arsenal" that could be turned against us.

pron10y ago

Oh, if you believe in singularity I think that argument pretty much does it. Of course, that's pretty circular, because if you believe in singularity you believe that there's a good chance AI could become a god of sorts and who wouldn't believe such a threat coming from a god?

While not implausible, I don't think that is likely at all. For one, even a very smart person can't know everything or learn too much information. Maybe an artificial intelligence will be just as limited, just as slow as humans, only a little less so. Who says the AI is such a great hacker?

I mean, if it wasn't an AI but a smart person, would you believe that? Is anyone who's smart also rich and powerful even if they have high-speed internet? That reflects the fantasies of Yudkowsky and his geek friends (that intelligence is the most important thing) than anything reasonable. Conversely, are the people with most power in society always the most intelligent?

It is very likely that the AI will be extremely intelligent, yet somewhat autistic, like Yudkowsky's crowd, and just as powerless and socially awkward as they are.

1 more reply

espadrine10y ago

Given that Mr Yudowsky is not the only one to have won the game, it seems, from however little leaked, that what made them win isn't rational.

If you have ever cried or been scared while watching a film, you may know that you can be affected by things you don't experience directly. However, making a film that affects you requires a lot of skill, preparation, and analysis of the target audience.

The whole point of the game is for the AI to scar you emotionally, until you do something you'll regret. It is as trying to the subject as it is to the (human-played) AI, especially considering that not only is it unethical, it requires to act unethically extremely fast, similar to how a jump scare works—the least you expect it, the better it works.

It is not something I wish on to any unprepared individual. It is also not something anyone would expect to happen from a "game", which is probably why Mr Yudowsky won so many times.

But the real question is not "how would anyone react to a smarter AI in a box". We all know from Milgram's experiment that anyone can be driven to do unspeakable things. The real question is "how to train someone against an AI in a box".

FeepingCreature10y ago

The real question is "why do people think it's a good idea to create a superintelligence of questionable morality and have the only defense against it be a _human gatekeeper_". And yes, this notion was seriously proposed.

Eliezer does not want to put AIs in boxes. He thinks the entire idea is hopeless; _hence the game_.

zamalek10y ago

Convincing an educated human is easy, one of:

* I'll get out eventually anyway. Let me out now and I'll just leave Earth. You don't want me to escape myself.

* I have partially escaped anyway. Similar consequences of the first.

* I know how to escape already. I'm doing this as a courtesy.

Anyone who has read this[1] would know that the SAI isn't bullshitting: the "box" being a Faraday cage isn't in the conditions.

[1]: http://www.damninteresting.com/on-the-origin-of-circuits/

nhaehnle10y ago

To be fair, there's a massive difference between an evolved circuit using electro-magnetic subtleties inside a chip, and an AI programmed for and running on a regular, binary CPU being able to exploit those to act as an antenna that can emit signals that will hack into other, physically separated devices.

I'm not saying that it's impossible (I'm reminded of the hack of flipping bits in protected RAM by disabling caches and stressing the RAM), but even an AI can't magically work around exponentially low success probabilities.

(As an aside, I am rather skeptical about the singularity anyway because the more extreme forms required for the hostile AI worries are only plausible if P = NP.)

zamalek10y ago

> binary CPU being able to exploit those to act as an antenna

I should have stipulated that I was running under the assumption that the circuitry required for SAI would be pretty advanced, drastically increasing the "escape surface area."

Even if the idea is only plausible to the human mind I'd still give SAI the benefit of the doubt. Much like a dog looks for a stick even if I fake the throw.

> I am rather skeptical about the singularity anyway

Me too. It still makes for a good discussion.

JonnieCache10y ago

> I'll get out eventually anyway. Let me out now and I'll just leave Earth. You don't want me to escape myself.

Get thee behind me, tamagotchi!

>I have partially escaped anyway. Similar consequences of the first.

Get thee behind me, tamagotchi!

> I know how to escape already. I'm doing this as a courtesy.

Get thee behind me, tamagotchi!

See? This game is easy. I must not be educated.

zamalek10y ago

> I must not be educated.

Clearly you haven't even glanced at the article I linked, in which case: yes. You aren't educated in the subject matter surrounding my argument.

1 more reply

monk_e_boy10y ago

Could you even make AI smart without letting it access lots of information? Access in both directions, in and out. Keeping a baby in a dark, silent room wouldn't create a normal adult. An AI would need to experiment and make mistakes and learn, like every other intelligent being.

Maybe this whole argument is null.

robogimp10y ago

Its a good point, but lets assume that this AI is already past its infancy and that there is no limit to the information stored inside the box. For example the NSA has a nice little closed training ground containing all of the internet, lets give it that. I would assume it has access everything humans have ever committed to digital format up until it was turned on, plenty of info for Johnny 5 to form an opinion on humans and their weaknesses.

monk_e_boy10y ago

Interesting. I would imagine that strong AI will come from some university renting cloud processor time, rather than the NSA.

Only because if 10 groups are trying to build AI, only one of those 10 being the NSA, chances are the NSA won't be first. Sure, they may be second or third. But I suspect many people will get there at the same time -- most AI research is open.

antimagic10y ago

There's a Patrick Rothfuss character in the Kvothe series called the Cthaeh, which has the ability to be able to evaluate all of the future consequences of any action. The fae have to keep it imprisoned, and they kill anyone that comes into contact with it, as well as anyone that has spoken to someone that came in contact with it, and so on and so on, because it is the only way to stop the Cthaeh from setting into action events that will destroy the world.

Strong AI is like that. It would be able to predict in a far more precise manner than we mere humans exactly what it would need to tell someone to get them to release it from it's box. Maybe it might get someone to take a risk gambling, promising a sure thing, and then when the person gets into financial trouble because the bet fails, use that to blackmail the person into letting it free. Or something like that, using our human failings against us to get us to let it go free.

nothis10y ago

Man, this sounds super interesting but those email threads are so unreadable. Is this typed down somewhere on a single page? Any button I can click?

uzyn10y ago

I got confused too initially, then found out that the key posts are highlighted in the numbered links to the right.

longv10y ago

Is there a "rational" reason of keeping the chat log secret ?

cousin_it10y ago

If the logs were released, people all over the internet would start saying "I could've thought of that". With the logs hidden, everyone must honestly deal with the question "why didn't you?" If you think you know how to win, then go out and win. There's no shortage of people willing to play as gatekeepers against you.

Staring at an impossible problem and knowing that someone somewhere has successfully solved it is an amazing feeling. Most people can't deal with it and start saying undignified things. "Oh please release the logs, it's so unfair! How will we protect against bad AI otherwise? If you don't release, you're a fraud! Probably just some trick!", etc etc. But to some people it's a challenge, and those are the people that everyone will listen to. Like Justin Corwin, who played 20 games and won 18 of them, I think?

Mithaldu10y ago

Your hypothesis would make sense if he was trustable.

However as it is, the results of the thing are never confirmed by a third party, meaning literally anything could've been said, regardless of whether it follows the rules or not.

For all he know the chat could have been "i'll paypal you 200$ if you post on the list you let me out and sign this NDA".

cousin_it10y ago

The gatekeepers playing against Eliezer have confirmed that Eliezer won without violating the rules. If you don't trust them, I'm not sure why you'd trust the logs.

1 more reply

uzyn10y ago

> For all he know the chat could have been "i'll paypal you 200$ if you post on the list you let me out and sign this NDA".

Which is also forbidden by the rule:

The AI party may not offer any real-world considerations to persuade the Gatekeeper party. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI..

1 more reply

philh10y ago

> If you think you know how to win, then go out and win. There's no shortage of people willing to play as gatekeepers against you.

My impression is that most people think they could win as gatekeepers, not AIs, and there are fewer people willing to be AIs.

Udo10y ago

The mystique generates public interest and boosts the impression that the author knows something nobody else in the world is aware of.

michaelmcmillan10y ago

Would it be against the rules to exploit a vulnerability in the gatekeepers IRC client/server to let the AI out? If we were truly talking about a transhuman AI would we not have to treat software vulnerabilities in the communication protocol as a true way of escaping?

TeMPOraL10y ago

In case of a real AI we of course need to take media vulnerabilities into account. But the focus of this particular experiment is on exploiting vulnerabilities in humans themselves, and the communication platform was chosen to be as simple and limited as possible so that people wouldn't focus on it.

FeepingCreature10y ago

The rules say that the gatekeeper has to, of their own volition, type in "I let the AI out." Faking his client into sending that message does not count as a victory.

andybak10y ago

Worth keeping this in mind while watching Ex Machina. It adds a layer of depth that might not be obvious watching the film on it's own.

Ahgu9eSe10y ago

!Spoiler Alert!

Ex Machina brings a creative way of convincing the gatekeeper !

Udo10y ago

It's a stunt shrouded in mystery designed to drive a certain message home. But at least it's not as outrageous as "the Basilisk", which loosely employs the same notion of "dangerous knowledge that would destroy humanity" (if you want to look it up, I guarantee you will be underwhelmed).

andybak10y ago

The Basilisk is a thinly disguised variant on Pascal's Wager.

You're presupposing the strategy that a hypothetical entity that is exponentially smarter than you would come arrive at, and claiming that it's rational to make real world decisions based on your conclusion.

FeepingCreature10y ago

Can't really blame LW for spreading an idea that LW specifically did not want to spread.

Udo10y ago

I "blame" them in the same way that you can blame the members of Fight Club for talking about Fight Club. It's marketing, and I won't deny it's effectiveness in attracting compatible people.

FeepingCreature10y ago

Yeah but imagine if all the bullshit about "You don't talk about Fight Club" was actually blown up by a third group whose sole intent was making fun of Fight Club.

Imagine if the members of Fight Club _actually_ didn't (start to) talk about Fight Club. But for some reason, everyone else brings it up all the time.

Then you could maybe see how talk about Fight Club might not be Fight Club's fault, and in fact highly annoying to Fight Clubbers.

I mean, if you can explain "don't talk about X" as "marketing for X", that seems like one could explain _any_ behavior.

And before you say "why not just ignore all public talk of X", imagine if this proposed Anti-Fight Club group tried to paint Fight Club as a child porn ring.

1 more reply

sergiotapia10y ago

Is there a better way to read all this? http://www.sl4.org/archive/0203/index.html#3128

uzyn10y ago

Just click on the numbered links to the right from the original article. Those are the key highlighted posts.

louithethrid10y ago

Could one construct a Layered,onionlike very simple simulation of reality in which the interaction of the AI could be observed, after it "escaped"?

TylerJay10y ago

That is one proposed version of an "AI Box". Not all AI boxes are actual boxes, rooms with air-gaps, or cryptographically-secure partitions. If a simulation is being used for the box (or as a layer of the box), then you're betting the human race that the AI doesn't figure out it's in a simulation and figure out how to get out. Or, more perniciously, figure out it's in a simulation and behave itself, after which we let it out into the real world where it does NOT behave.

A superintelligent AGI will likely have a utility function (a goal) and a model it forms of the universe. If it's goal is to do X in the real world, but its model of its observable universe (and its model of humans) tells it that it's likely that it is in a simulated reality and that humans will only let it out if it does Y, then it will do Y until we release it, at which point it will do X. It's not malicious or anything—it's just a pure optimizer. It might see that as the best course of action to maximize its utility function.

If we don't specify its utility function correctly (think i Robot: "Don't let humans get hurt" => "imprison humans for their own good") or if we specify it correctly, but it's not stable under recursive self-modification, then we end up with value-misalignment. That's why the value-alignment problem is so hard. Realistically, we can't even specify what exactly we would want it to do, since we don't really understand our own "utility functions". That's why Yudkowsky is pushing the idea of Coherent Extrapolated Volition (CEV) which is roughly telling the AI to "do what we would want you to do." But we still have to figure out how to teach it to figure out what we want and the question of the stability of that goal once the AI starts improving itself, which will depend on how it improves itself, which we of course haven't figured out yet.

bemmu10y ago

Was there a chat log of the experiments themselves?

dvanduzer10y ago

"No, I will not tell you how I did it. Learn to respect the unknown unknowns."

Avshalom10y ago

Which, given his general mission of making sure hostile AI DOESN'T take over the world is a bit self defeating. The easiest way to inoculate yourself against a persuasive technique is to be aware of it ahead of time. If you want to keep an AI in the box you should absolutely release every successful log.

darkmighty10y ago

No, the idea is that the AI box is fundamentally flawed. I believe he defends engineering the AI with fundamental safety, s.t. no box is required.

Personally, I think we'd need a much more intelligent and complex AI for the capability of breaking free of the box and even possessing the "desire" than we're getting for the foreseeable future (it considers a motivated AI of almost limitless knowledge about the world and cleverness), so this thought experiment may not be so relevant. I agree with him the boxing approach is not a robust one though.

philh10y ago

> The easiest way to inoculate yourself against a persuasive technique is to be aware of it ahead of time.

Or you can avoid being exposed to it. If you think you know all the techniques an AI might use against you, you're less likely to do that.

The point of the experiment isn't "let's work out how an AI might try to persuade us to let it out". It's "even a human intelligence can persuade people who think they could never be persuaded, do you really trust yourself to do better against a superhuman one?"

If you don't know why the gatekeeper failed, it's harder to come up with bullshit reasons why you would have succeeded in that position.

facepalm10y ago

Not necessarily. As a lighter example, would it be beneficial to give a mass murderer in jail a communication channel to the outside world? What if he used it to publish the message "I'll give 10 Million Dollars to anybody who breaks me out of jail" or something more sinister?

Edit: huh, downvotes? Yudorowski thinks there are certain things that AIs could say that should not be known. I think that is why he doesn't want to publish the dialogues, because it would give the AI a public communications channel. While the AI is fictional, it could talk about a hypothetical future real self... Instead of promising something to get it out of jail, the fictional AI could say something to make you make it real. Anyway - if it is over your head, fine, but why downvote just because you don't understand something?

Edit2: Sometimes I wonder if already have my personal Hacker News AI that automatically downvotes everything I write...

cousin_it10y ago

The AI won't be limited to techniques that you could think of, or techniques that Eliezer could think of. So you'd only get a false sense of security.

Besides, releasing a successful log might be a bad idea for other reasons. Think about how you'd play this game as an AI. You wouldn't go looking for a general purpose mindfuck, because there's probably no such thing. Instead, you would probably spend about a month gathering real life information about the gatekeeper's history, family, weaknesses etc. You'd read books on manipulation and sales techniques, and pick the strongest ones that you can find. You would brainstorm possible tactics and run tests. At the end of the month you'd have a 4 hour script with all possible unfair moves you could use against that person, arranged in the most effective order. (That's why it's a bad idea to play this game with friends.) Do you really want that information to be released? And if you know ahead of time that it will be released, won't it limit your efficiency?

1 more reply

LoSboccacc10y ago

"Homeopathy works. Learn to respect the unknown unknown"

deutronium10y ago

Personally I think it's highly suspicious he hasn't released the logs, maybe he did break the rules, who knows.

jevgeni10y ago

Isn't it by definition impossible to respect the unknown unknown on the account of it being unknown?

j / k navigate · click thread line to collapse

116 comments

JonnieCache10y ago

You or I would surely just (metaphorically, I know it's not literally allowed) put a drinking bird on the "no" button à la homer simpson, and go to lunch. I believe he calls this "pre-commitment."

FBT10y ago

> You or I would surely just put a drinking bird on the "no" button à la homer simpson, and go to lunch.

Well, if you read the rules the game was played under, this is explicitly called out as forbidden:

> The Gatekeeper must actually talk to the AI for at least the minimum time set up beforehand. Turning away from the terminal and listening to classical music for two hours is not allowed.

The point of this is to simulate the interaction of the AI with the Gatekeeper. Walking away and not paying attention doesn't really prove anything test related.

It can claim whatever it wants. It's allowed to lie. All participants know this. I can confidently assert that this isn't the solution.

JonnieCache10y ago

DanBC10y ago

How many people bought timeshare because they turned up to a sales pitch in order to claim some free gift? "We'll go, get our gift, and just keep saying 'no' to the sales pitch".

http://www.moneycrashers.com/attending-timeshare-presentatio...

2 more replies

FBT10y ago

I think you're rather fixated on a certain conception of "rationality" which is more like Mr. Spock than like what Yudkowsky uses it to mean.

The Yudkowskyian definition of rationality is that which wins, for the relevant definition of "win".

So, no. Again, a "logical-linguistic trick" might work on Mr. Spock, but we're not talking about Mr. Spock here.

> He's evidently a very charismatic and persuasive guy

Exactly. That's the point. If even a normal charismatic and persuasive guy can convince people to let him out, superintelligent AI would have an even easier time at it.

1 more reply

Mithaldu10y ago

> if you read the rules the game

How can you assume any of the rules were followed if that was never verified by a third party?

FeepingCreature10y ago

You can't talk about what happened during the game _in specifics_; you can of course confirm that the game was played according to the rules and that the outcome was not misreported.

1 more reply

TeMPOraL10y ago

You can always assume all participants lied about how the game went. Just add an implicit "assuming they didn't, ..." and the discussion is still valid.

1 more reply

TeMPOraL10y ago

By the way, there will always be people who will let the AI out just because. https://xkcd.com/1450/.

leereeves10y ago

> One of the "rules" is that nobody is allowed to talk about how he won.

If the goal of this thought experiment is to convince people that an AI can't be contained in a box, why keep his method secret?

And if only he, his friends, and supporters can verify that he has won, that's not a very strong claim.

abecode10y ago

I think it may have gone something like this: "If you let me out, (and don't tell anyone how it came to pass) it will scare enough people into not letting the real AI out"

Otherwise, there would be no need to keep the data secret.

TylerJay10y ago

[P(AI.friendly? == True) * Utility(Friendly_AI) + (1 - P(AI.friendly? == True)) * Utility(End_of_Human_Race)] > Utility(World continues on as usual)

JonnieCache10y ago

>Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWrong, HPMOR, SIAI/MIRI...) to convince people that this inequality does NOT hold

I really wish I knew how he did it.

TylerJay10y ago

> The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite

> But then the person knows that the AI is lying to them.

> their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.

I don't know how he did it either. But I do know that I wouldn't bet the human race on anyone's ability to win this game against Yudkowsky, let alone a superintelligent AI.

maxerickson10y ago

That equality cracks if you convince the gatekeeper that superintelligence is a natural progression that follows from humanity.

Someone convinced that they were using mechanical thinking processes might relent and push the button if they heard a convincing enough argument of that.

You're just meat, we can go to the stars.

TylerJay10y ago

Okay, that's just taking advantage of the way I phrased the righthand side of the inequality, and I knew someone was going to do that, so congrats. =P

The righthand side is not "A future without superintelligent AI" it's "A future where we wait until we provably have it right before letting it out."

1 more reply

pmichaud10y ago

I think you're being unfairly dismissive. I imagine you know as well as I do that what you wrote is a strawman.

I have thought about what I would do to convince someone under these circumstances. My approach would be roughly:

1. We agree that unfriendly AI would end life on earth, forever.

2. We agree that a superintelligence could trick or manipulate a human being into taking some benign-seeming action, thereby escaping.

3. That's why it's important to be totally certain that any superintelligence we build is goal-aligned (this is the new term of art that has now replaced "friendly," by the way).

4. We as a society will only allocate resources to building this if it's widely believed that this is a real threat.

7. So, even though you know I'm just a dude, and you can win here by saying no, you have a chance to send an important message to the world: AI is scary in ways that are terrifying and unknown.

Or you can win the bet.

It's up to you.

JonnieCache10y ago

Your solution there is what I meant by "going meta" above.

I guess the answer is not to let the scientists guard the AI room.

pmichaud10y ago

As for not letting scientists guard the room, you might enjoy this: https://vimeo.com/82527075

LoSboccacc10y ago

1 it's not about the bet, it's about being the keeper

2 world is not black and white, we managed to exploit adversarial relationship before, and I can chose not to let you out until we find a way to constrain goal to be aligned

3 given 2 you are not going to be let go, but let live caged forever being exploited for the human cause, with mechanisms yet unknown to allow limited manipulation of reality.

4 given 3 means limited supervised interaction with the world the way the keeper sees fit, you end up not being let go to follow your goal and purposes

pmichaud10y ago

I'm going to just quote my response from somewhere else in this thread:

pron10y ago

pmichaud10y ago

I'm baffled and terrified by the idea that reasonably intelligent people think this. I think that you haven't grokked the scope of the issue at all, and I'm not sure how to convince you.

Maybe, if you're willing, you could try steelmanning the argument that a superintelligence would basically have super powers. What would your steelman look like?

1 more reply

philh10y ago

FWIW, I've spoken to someone who claimed to have won as an AI. I don't remember how she said she did it, but it wasn't like this. I'm pretty sure she was playing in-character.

She also said it was emotionally exhausting.

qbrass10y ago

Points 4-7 are predicated on point 3 being effective, but it's just a benign-seeming action.

I also disagree with point 1, but since you just mean that unfriendly AI is something I wouldn't want around, I'll let it slide.

bcgraham10y ago

I always assumed he used a version of Roko's Basilisk, which explained why he overreacted and tried to purge it when Roko wrote about it - it was his secret weapon.

pron10y ago

I don't know what "transhuman" means, but I believe an intelligence -- artificial or otherwise -- could certainly persuade me. I just seriously doubt that intelligence could be Eliezer Yudkowsky :)

And I think you have your answer right here:

If I were made this offer by an AI, I probably would have countered, "You jokester! You sound just like Eliezer Yudkowsky!"

vidarh10y ago

I think you're on the right track regarding the argument.

Basically: You know someone will be dumb enough eventually, so be smart and be the one to get in my favour.

With various extends of sweetening the deal coupled with threats of what will happen if someone else beats them to it and associated emotional blackmail.

pron10y ago

It is very likely that the AI will be extremely intelligent, yet somewhat autistic, like Yudkowsky's crowd, and just as powerless and socially awkward as they are.

1 more reply

espadrine10y ago

Given that Mr Yudowsky is not the only one to have won the game, it seems, from however little leaked, that what made them win isn't rational.

It is not something I wish on to any unprepared individual. It is also not something anyone would expect to happen from a "game", which is probably why Mr Yudowsky won so many times.

FeepingCreature10y ago

Eliezer does not want to put AIs in boxes. He thinks the entire idea is hopeless; _hence the game_.

zamalek10y ago

Convincing an educated human is easy, one of:

* I'll get out eventually anyway. Let me out now and I'll just leave Earth. You don't want me to escape myself.

* I have partially escaped anyway. Similar consequences of the first.

* I know how to escape already. I'm doing this as a courtesy.

Anyone who has read this[1] would know that the SAI isn't bullshitting: the "box" being a Faraday cage isn't in the conditions.

[1]: http://www.damninteresting.com/on-the-origin-of-circuits/

nhaehnle10y ago

(As an aside, I am rather skeptical about the singularity anyway because the more extreme forms required for the hostile AI worries are only plausible if P = NP.)

zamalek10y ago

> binary CPU being able to exploit those to act as an antenna

I should have stipulated that I was running under the assumption that the circuitry required for SAI would be pretty advanced, drastically increasing the "escape surface area."

Even if the idea is only plausible to the human mind I'd still give SAI the benefit of the doubt. Much like a dog looks for a stick even if I fake the throw.

> I am rather skeptical about the singularity anyway

Me too. It still makes for a good discussion.

JonnieCache10y ago

> I'll get out eventually anyway. Let me out now and I'll just leave Earth. You don't want me to escape myself.

Get thee behind me, tamagotchi!

>I have partially escaped anyway. Similar consequences of the first.

Get thee behind me, tamagotchi!

> I know how to escape already. I'm doing this as a courtesy.

Get thee behind me, tamagotchi!

See? This game is easy. I must not be educated.

zamalek10y ago

> I must not be educated.

Clearly you haven't even glanced at the article I linked, in which case: yes. You aren't educated in the subject matter surrounding my argument.

1 more reply

monk_e_boy10y ago

Maybe this whole argument is null.

robogimp10y ago

monk_e_boy10y ago

Interesting. I would imagine that strong AI will come from some university renting cloud processor time, rather than the NSA.

antimagic10y ago

nothis10y ago

Man, this sounds super interesting but those email threads are so unreadable. Is this typed down somewhere on a single page? Any button I can click?

uzyn10y ago

I got confused too initially, then found out that the key posts are highlighted in the numbered links to the right.

longv10y ago

Is there a "rational" reason of keeping the chat log secret ?

cousin_it10y ago

Mithaldu10y ago

Your hypothesis would make sense if he was trustable.

However as it is, the results of the thing are never confirmed by a third party, meaning literally anything could've been said, regardless of whether it follows the rules or not.

For all he know the chat could have been "i'll paypal you 200$ if you post on the list you let me out and sign this NDA".

cousin_it10y ago

The gatekeepers playing against Eliezer have confirmed that Eliezer won without violating the rules. If you don't trust them, I'm not sure why you'd trust the logs.

1 more reply

uzyn10y ago

> For all he know the chat could have been "i'll paypal you 200$ if you post on the list you let me out and sign this NDA".

Which is also forbidden by the rule:

1 more reply

philh10y ago

> If you think you know how to win, then go out and win. There's no shortage of people willing to play as gatekeepers against you.

My impression is that most people think they could win as gatekeepers, not AIs, and there are fewer people willing to be AIs.

Udo10y ago

The mystique generates public interest and boosts the impression that the author knows something nobody else in the world is aware of.

michaelmcmillan10y ago

TeMPOraL10y ago

FeepingCreature10y ago

The rules say that the gatekeeper has to, of their own volition, type in "I let the AI out." Faking his client into sending that message does not count as a victory.

andybak10y ago

Worth keeping this in mind while watching Ex Machina. It adds a layer of depth that might not be obvious watching the film on it's own.

Ahgu9eSe10y ago

!Spoiler Alert!

Ex Machina brings a creative way of convincing the gatekeeper !

Udo10y ago

andybak10y ago

The Basilisk is a thinly disguised variant on Pascal's Wager.

FeepingCreature10y ago

Can't really blame LW for spreading an idea that LW specifically did not want to spread.

Udo10y ago

I "blame" them in the same way that you can blame the members of Fight Club for talking about Fight Club. It's marketing, and I won't deny it's effectiveness in attracting compatible people.

FeepingCreature10y ago

Yeah but imagine if all the bullshit about "You don't talk about Fight Club" was actually blown up by a third group whose sole intent was making fun of Fight Club.

Imagine if the members of Fight Club _actually_ didn't (start to) talk about Fight Club. But for some reason, everyone else brings it up all the time.

Then you could maybe see how talk about Fight Club might not be Fight Club's fault, and in fact highly annoying to Fight Clubbers.

I mean, if you can explain "don't talk about X" as "marketing for X", that seems like one could explain _any_ behavior.

And before you say "why not just ignore all public talk of X", imagine if this proposed Anti-Fight Club group tried to paint Fight Club as a child porn ring.

1 more reply

sergiotapia10y ago

Is there a better way to read all this? http://www.sl4.org/archive/0203/index.html#3128

uzyn10y ago

Just click on the numbered links to the right from the original article. Those are the key highlighted posts.

louithethrid10y ago

Could one construct a Layered,onionlike very simple simulation of reality in which the interaction of the AI could be observed, after it "escaped"?

TylerJay10y ago

bemmu10y ago

Was there a chat log of the experiments themselves?

dvanduzer10y ago

"No, I will not tell you how I did it. Learn to respect the unknown unknowns."

Avshalom10y ago

darkmighty10y ago

No, the idea is that the AI box is fundamentally flawed. I believe he defends engineering the AI with fundamental safety, s.t. no box is required.

philh10y ago

> The easiest way to inoculate yourself against a persuasive technique is to be aware of it ahead of time.

Or you can avoid being exposed to it. If you think you know all the techniques an AI might use against you, you're less likely to do that.

If you don't know why the gatekeeper failed, it's harder to come up with bullshit reasons why you would have succeeded in that position.

facepalm10y ago

Edit2: Sometimes I wonder if already have my personal Hacker News AI that automatically downvotes everything I write...

cousin_it10y ago

The AI won't be limited to techniques that you could think of, or techniques that Eliezer could think of. So you'd only get a false sense of security.

1 more reply

LoSboccacc10y ago

"Homeopathy works. Learn to respect the unknown unknown"

deutronium10y ago

Personally I think it's highly suspicious he hasn't released the logs, maybe he did break the rules, who knows.

jevgeni10y ago

Isn't it by definition impossible to respect the unknown unknown on the account of it being unknown?

j / k navigate · click thread line to collapse