Re: [sl4] Rolf's gambit revisited

From: Matt Mahoney (
Date: Thu Jan 01 2009 - 10:39:34 MST

--- On Wed, 12/31/08, Norman Noman <> wrote:

> (rolf's gambit is a method for cooperation between powers in different
> worldlines via mutual simulation, gone over in detail in this earlier sl4
> thread:
> (if you haven't read that thread, you really should, it's probably the most
> interesting thing to ever come out of this list)
> I was thinking today about a puzzle. Let's say you're a friendly AI, and
> you're going to enact rolf's gambit. But before you do that, you take over a
> few solar systems, and you discover obvious proof that your world is a
> simulation. For the sake of argument, let's say it's an indestructible tuba
> on the third moon of saturn.
> The question is this: assuming you continue with rolf's gambit, do you
> include the tuba in your subsimulations? Why or why not?

First, there is no such thing as indisputable proof. There is only belief. If an AI believes the universe is real or simulated, it is because it was programmed that way (on purpose or by accident). If the two cases are indistinguishable, then belief one way or the other should have no effect on rational behavior because it does not affect expected utility. In particular, there is no difference from the viewpoint of either AI or humanity between a real world AI wiping out real world humanity and a simulated AI wiping out simulated humanity.

But let's say for the sake of argument that it does matter.

An AI running in a simulation cannot know anything about the AI that is simulating it. Any Turing machine can simulate any other Turing machine. The simulated AI might believe otherwise, but only because the simulating AI biased it toward those beliefs.

If an RAI needs to wipe out humanity to complete its goal of computing C, then it will do so. In order for Rolf's gambit to work, the RAI would have to biased to believe in the possibility of an FAI simulating it and biased to guess the motivations of the FAI, i.e. biased to believe in the existence of a humanity that does not want to be exterminated in the simulating world just like in the simulated world, except that the simulating humanity could effectively control the FAI*. It would be simpler (or at worst, equivalently difficult) just to bias the RAI to not want to wipe out humanity. But if we could do that, we wouldn't need to worry about the FAI problem.

*This is logically inconsistent, because it implies that humanity would have greater intelligence than the FAI. The RAI (also having superhuman intelligence) would know that the friendliness problem is ill defined. Any superhuman AI could be made to appear friendly while doing whatever it wants by reprogramming our brains' definition of friendliness.

-- Matt Mahoney,

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:03 MDT