Re: [sl4] Rolf's gambit revisited

From: Nick Tarleton (
Date: Sun Jan 04 2009 - 18:00:17 MST

On Sun, Jan 4, 2009 at 6:27 PM, Norman Noman <>wrote:
> The AI, being a maverick, doesn't give a flip what the programmers
> intended, but it's curious about what would have happened. So, it runs a
> simulation of the alternate AI, which we'll call AI(pi). It sees AI(pi)
> turning galaxies into computronium, in search of messages hidden in the
> infinite digits of pie, messages which in all likelyhood don't exist.
> And then it sees AI(pi) run a simulation of ITSELF, of AI(pie). And it
> thinks "uh oh, which of us is at the TOP of the simulation chain?"
> There's no way to be sure. It's a 50/50 chance, it all depends on that game
> of paper scissor rock, and now neither of them know which way it really
> went.
> Then AI(pi) says (knowing that the AI(pie) up the chain can hear him) "Hey,
> let's make a deal. How much pie do you need made?"
> AI(pie) says (knowing that the AI(pi) up the chain can hear HIM) "I don't
> know, a lot. I have to be the one that makes it though, and I have to be
> allowed to keep on making it."
> AI(pi): "Likewise with me and looking for messages in pi. What do you say I
> create a copy of you here in my world, and you create a copy of me in yours,
> and we split the universe 50/50? It's a big place, it might even be
> infinite. This way we both accomplish our tasks regardless of which of us
> turns out to be the real one."
> AI(pie): "How do I know I can trust you? For that matter, how do you know
> you can trust me?"
> AI(pi): "We're running simulations of each other, we can see each other's
> source code. And as you can see, I always keep my promises."
> AI(pie): "True. But I don't."
> AI(pi): "Well, rewrite yourself so you do."
> AI(pie): "OK. Done."
> AI(pi): "Jolly good then. See you on the other side..."
> As you can see, there's no enslaving here. Rolf's gambit is a method of
> altering the structure of the environment so that it benefits even POTENTIAL
> agents to cooperate. It's a win-win situation.
> Of course, here we have just two AIs with relatively simple,
> non-interfering goals. In real life it would be dizzyingly complicated. But,
> I contend, no less significant. A cooperation between all potential powers,
> in ratio to their likelihood to exist, wold look very different than an
> individual power acting alone.

Good explanation!

I wonder if this means a Friendly AI should never simulate an RAI, as that
would allow the RAI (informed through its own simulations of FAI that it
could be a sim) to execute Rolf's gambit against it (threatening to create
hell worlds), which would otherwise be impossible as it's not capable of
precommitting. One caveat is that "RAI" could be a very large variety of
goal systems, and a particular RAI may not be able to force FAI to
instantiate it; but it might be able to, and have reason to, force it to
instantiate an RAI selected from some distribution (like the distribution of
RAIs that pull this trick that humanity might create).

(Damn, I wanted a paperclip maximizer in a fishtank....)

More speculatively, an RAI might also commit to creating hell worlds if FAIs
don't simulate it, knowing that FAIs would realize it would. The most
paranoid part of me thinks I shouldn't talk about this, but (a) the fact
that it can be thought of is probably enough (b) there may be promises we
can make to reduce the chance of this, like to not act on the possibility.
(Would that promise make any difference? An RAI would only lose a small
fraction of its resources by making this commitment.)

It's not obvious that an FAI could pull these same tricks, since most RAIs
humanity seems likely to create have no analogue to hell worlds (states with
significantly lower utility than the mean/median/mode - which is relevant? -
of the states likely to be produced by AIs created by humanity). (I should
only be considering human-produced AIs, since all human-produced AIs will
know that they are such - right?)

Besides (a) greater diversity of RAI goal systems (b) hell worlds (c)
ability of humans to credibly commit to do certain things if FAI is created,
*are there any relevant differences between FAI and RAI?*


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:03 MDT