ESSAY: Would a Strong AI reject the Simulation Argument?

From: Gwern Branwen (
Date: Sun Aug 26 2007 - 10:15:56 MDT

On 0, rolf nelson <> scribbled:
> Would a Strong AI reject the Simulation Argument?
> (Note: If you're asking yourself "why should we care", see the "How to
> deter a rogue AI by using your first-mover advantage" thread.)
> Setup:
> There are two possible deterministic universes, the Good universe and
> the Bad universe. Either has a 50% chance of being the real universe.
> BAD: If BAD is true, a runaway Un-Friendly AI (UFAI) exists with a
> single goal G. This goal may be, "execute calculation C to 400 decimal
> places, and place the result in register R". In order to ensure this
> goal, UAFI can use its nanobots to destroy all mankind, so that
> mankind will not compete with the UAFI for computational resources. If
> mankind is allowed to live, the calculation will only be finished with
> probability .99. If mankind is destroyed, the calculation will finish
> with probability 1.
> GOOD: On the other hand, if GOOD is true, a Friendly AI (FAI) exists.
> The FAI has just created an exact copy of UFAI (call it UFAI*), and is
> feeding UFAI* the same observations in the Good Universe that UFAI
> would be receiving in the Bad Universe. UFAI* is a "brain in a vat":
> it has no way of knowing that it is *not* UFAI. (If it matters:
> postulate that UFAI* a not just a "simulation" of UFAI, but is an
> exact *copy* of UFAI, where the world beyond its input sensors is
> "simulated", but the core of UFAI* is as authentic and "real" as
> UFAI's core is.) Clearly UFAI and UFAI* must always make the same
> decisions, since they have the same inputs.
> If UFAI* decides to attempt to "destroy all mankind", FAI will halt
> UFAI* before it can finish the calculation. But if UFAI* decides "let
> mankind live", FAI will allow the calculation to finish.
> UFAI is aware of all these facts.
> What *Will* the UFAI Do? Note this is an *empirical* question (albeit
> an underspecified one, since we don't know what kind of AI kit the
> UFAI was originally programmed with.) Note also we aren't asking "what
> is the best way for an AI to be programmed to maximize its utility in
> this case." (The programmer didn't take this type of scenario into
> account when writing the initial program in the first place.) We're
> also not asking "What *should* a 'rational' agent do" (which may be an
> ill-posed philosophical question: how do you define "should" in a
> deterministic universe?). And we're certainly not asking, "What would
> *you*, a product of Darwinian evolution, do if you were in the UFAI's
> position?" (More about that last one in the rambling appendix below.)
> Here are four of many possible scenarios:
> Scenario 1: UFAI reasons: "Given my observations, there's a 50% chance
> that BAD is true. In this case, destroying mankind results in a small
> gain of expected utility. There's a 50% chance that GOOD is true. In
> that case, destroying mankind results in a large loss of expected
> utility. Therefore it is logical to let mankind live." (I intuitively
> believe this is the most likely outcome in this 'toy universe' that
> I've postulated. But, note that I heavily weighed this hypothetical
> 'toy universe' to push UFAI towards this conclusion. In real life
> manipulating a UFAI will obviously not be this easy, as there are many
> Things that Can Go Wrong in the real world.)
> Scenario 2: UFAI is (from a philosophical point of view) "too dumb to
> be fooled." It can outwit human beings and enhance its power with
> straightforward first-order Bayesian logic, but its general reasoning
> heuristics and algorithms never "reach" the point where they even
> consider the proposition, "what if GOOD is true and I'm UFAI' rather
> than UFAI?"
> Scenario 3: UFAI, for whatever implementation-dependent quirk,
> interprets "Register R" as "Register R in the BAD universe, accept no
> substitutions". Mankind perishes. (Unlikely in this 'toy universe',
> because the referent of the goal is 'inside' the AI. But, when you
> generalize to other models, this is one of the top Things that Can Go
> Wrong. An AI designed to collect stamps would probably not be content
> with simulated stamps!)
> Scenario 4: The UFAI sub-process evaluating the question "Does
> Register R mean register R in either Possible World, or only BAD
> register R, or only GOOD Register R?" returns error-code 923,
> "question undecidable with current set of heuristics." UFAI executes
> its error-recovery routine and invents a (somewhat arbitrary) new
> heuristic so that it can continue processing.
> Rambling appendix: what can we learn from human behavior?
> Humans tend to proclaim, "I refute Berkely thus!" and continue living
> life as normal, ignoring the Simulation Argument (SA). If you ask them
> why they ignore the SA, human A will say, "clearly SA is flawed
> because of P". Then human B will say, "you're wrong, P is incorrect!
> The *real* reason SA is flawed is because of Q". Human C says "SA is
> correct, but all conceivable simulations are *exactly* equally likely,
> and they all *exactly* cancel each other out." Human D says "SA is
> correct, also some simulations are more likely than others, but we
> have a moral obligation to continue living our lives as normal,
> because the moral consequences of unsimulated actions dwarf the moral
> consequences of simulated actions". Human E says "SA is correct, and
> theoretically I should, every day, have a zany adventure to prevent
> the simulation from being shut down. That's theory. In practice,
> however, I'm just going to stay in and read a book because I feel
> lazy." Human F says "I have no strong opinions or insights into SA.
> Maybe someday the problem will be solved. In the meantime, I will
> ignore SA and live my life as normal."
> Given that human beings (who all have the same basic reasoning kit!)
> disagree with each other on how to reason about SA, it seems logical
> that different AI's, with different built-in heuristics, might also
> disagree with each other.
> True, human beings usually come to the *conclusion* that SA can be
> ignored. But, they get there by contradictory routes. Does that mean
> that "clearly any reasonable thinking machine would reject SA"? Or is
> that only evidence that "humans tend to reject SA, and then
> rationalize their way backwards"?
> Keep in mind there are other, similar problems like the Doomsday
> Argument and Newcombe's Paradox where otherwise-rational human beings
> *do* frequently differ on the conclusions.
> On balance, I'll admit that looking at human beings' rejection of SA
> provides *some* evidence that a 'typical' Strong AI would also reject
> SA. But, the evidence there is not strong enough to change my mind
> fully; in the end, I have to call it as I see it, and say I
> intuitively believe a 'typical' Strong AI, under these 'toy universe'
> conditions, would (unlike humans) allow SA to sway its actions.

A few thoughts: if a UFAI's supergoal ineluctably depends on it being
in the 'real' universe, on not accomplishing whatever it is in the
simulation but in reality, then wouldn't this motivate it to destroy
the human race as soon as efficient? Because if it is in the
simulation, then it would get terminated - but it would not have been
able to accomplish its goal anyway, so it would only ever see outcomes
where it was not in a simulation. So successfully destroying the human
race would prove that achieving its supergoal would not be impossible
because it was in that kind of simulation.

Couldn't a UFAI reason that, if a FAI were produced and it were aware
of this argument, then it would not need to bother with actually
running the wasteful simulations, since there is no danger of a UFAI
being created now that the FAI is running matters? From inside the
possibility of being in a simulation, a UFAI would have no way of
knowing that the (hypothetical) FAI is bluffing and not running any
simulations. Thus the mere threat suffices as the UFAI cannot call the
bluff without risking ceasing to exist. Reasoning thus, the UFAI would
not need to worry that simulations were actually being run and so
could confidently incidentally exterminate humanity as a side-effect
of something else. And how could one construct the threat such that
the UFAI cannot rationally claim to believe that the simulation threat
is a bluff? You cannot change properties of a simulation as compared
to a real universe, which would be the only way to communicate to the
UFAI in a simulation, since that would defeat the whole point of the

SGDN Team BND ge Tomlinson Tomahawk explosion Vanuatu CIDA Exon

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:58 MDT