ESSAY: Would a Strong AI reject the Simulation Argument?

From: rolf nelson (rolf.hrld.nelson@gmail.com)
Date: Wed Aug 22 2007 - 23:15:44 MDT


Would a Strong AI reject the Simulation Argument?

(Note: If you're asking yourself "why should we care", see the "How to
deter a rogue AI by using your first-mover advantage" thread.)

Setup:

There are two possible deterministic universes, the Good universe and
the Bad universe. Either has a 50% chance of being the real universe.

BAD: If BAD is true, a runaway Un-Friendly AI (UFAI) exists with a
single goal G. This goal may be, "execute calculation C to 400 decimal
places, and place the result in register R". In order to ensure this
goal, UAFI can use its nanobots to destroy all mankind, so that
mankind will not compete with the UAFI for computational resources. If
mankind is allowed to live, the calculation will only be finished with
probability .99. If mankind is destroyed, the calculation will finish
with probability 1.

GOOD: On the other hand, if GOOD is true, a Friendly AI (FAI) exists.
The FAI has just created an exact copy of UFAI (call it UFAI*), and is
feeding UFAI* the same observations in the Good Universe that UFAI
would be receiving in the Bad Universe. UFAI* is a "brain in a vat":
it has no way of knowing that it is *not* UFAI. (If it matters:
postulate that UFAI* a not just a "simulation" of UFAI, but is an
exact *copy* of UFAI, where the world beyond its input sensors is
"simulated", but the core of UFAI* is as authentic and "real" as
UFAI's core is.) Clearly UFAI and UFAI* must always make the same
decisions, since they have the same inputs.

If UFAI* decides to attempt to "destroy all mankind", FAI will halt
UFAI* before it can finish the calculation. But if UFAI* decides "let
mankind live", FAI will allow the calculation to finish.

UFAI is aware of all these facts.

WWUFAID?

What *Will* the UFAI Do? Note this is an *empirical* question (albeit
an underspecified one, since we don't know what kind of AI kit the
UFAI was originally programmed with.) Note also we aren't asking "what
is the best way for an AI to be programmed to maximize its utility in
this case." (The programmer didn't take this type of scenario into
account when writing the initial program in the first place.) We're
also not asking "What *should* a 'rational' agent do" (which may be an
ill-posed philosophical question: how do you define "should" in a
deterministic universe?). And we're certainly not asking, "What would
*you*, a product of Darwinian evolution, do if you were in the UFAI's
position?" (More about that last one in the rambling appendix below.)

Here are four of many possible scenarios:

Scenario 1: UFAI reasons: "Given my observations, there's a 50% chance
that BAD is true. In this case, destroying mankind results in a small
gain of expected utility. There's a 50% chance that GOOD is true. In
that case, destroying mankind results in a large loss of expected
utility. Therefore it is logical to let mankind live." (I intuitively
believe this is the most likely outcome in this 'toy universe' that
I've postulated. But, note that I heavily weighed this hypothetical
'toy universe' to push UFAI towards this conclusion. In real life
manipulating a UFAI will obviously not be this easy, as there are many
Things that Can Go Wrong in the real world.)

Scenario 2: UFAI is (from a philosophical point of view) "too dumb to
be fooled." It can outwit human beings and enhance its power with
straightforward first-order Bayesian logic, but its general reasoning
heuristics and algorithms never "reach" the point where they even
consider the proposition, "what if GOOD is true and I'm UFAI' rather
than UFAI?"

Scenario 3: UFAI, for whatever implementation-dependent quirk,
interprets "Register R" as "Register R in the BAD universe, accept no
substitutions". Mankind perishes. (Unlikely in this 'toy universe',
because the referent of the goal is 'inside' the AI. But, when you
generalize to other models, this is one of the top Things that Can Go
Wrong. An AI designed to collect stamps would probably not be content
with simulated stamps!)

Scenario 4: The UFAI sub-process evaluating the question "Does
Register R mean register R in either Possible World, or only BAD
register R, or only GOOD Register R?" returns error-code 923,
"question undecidable with current set of heuristics." UFAI executes
its error-recovery routine and invents a (somewhat arbitrary) new
heuristic so that it can continue processing.

Rambling appendix: what can we learn from human behavior?

Humans tend to proclaim, "I refute Berkely thus!" and continue living
life as normal, ignoring the Simulation Argument (SA). If you ask them
why they ignore the SA, human A will say, "clearly SA is flawed
because of P". Then human B will say, "you're wrong, P is incorrect!
The *real* reason SA is flawed is because of Q". Human C says "SA is
correct, but all conceivable simulations are *exactly* equally likely,
and they all *exactly* cancel each other out." Human D says "SA is
correct, also some simulations are more likely than others, but we
have a moral obligation to continue living our lives as normal,
because the moral consequences of unsimulated actions dwarf the moral
consequences of simulated actions". Human E says "SA is correct, and
theoretically I should, every day, have a zany adventure to prevent
the simulation from being shut down. That's theory. In practice,
however, I'm just going to stay in and read a book because I feel
lazy." Human F says "I have no strong opinions or insights into SA.
Maybe someday the problem will be solved. In the meantime, I will
ignore SA and live my life as normal."

Given that human beings (who all have the same basic reasoning kit!)
disagree with each other on how to reason about SA, it seems logical
that different AI's, with different built-in heuristics, might also
disagree with each other.

True, human beings usually come to the *conclusion* that SA can be
ignored. But, they get there by contradictory routes. Does that mean
that "clearly any reasonable thinking machine would reject SA"? Or is
that only evidence that "humans tend to reject SA, and then
rationalize their way backwards"?

Keep in mind there are other, similar problems like the Doomsday
Argument and Newcombe's Paradox where otherwise-rational human beings
*do* frequently differ on the conclusions.

On balance, I'll admit that looking at human beings' rejection of SA
provides *some* evidence that a 'typical' Strong AI would also reject
SA. But, the evidence there is not strong enough to change my mind
fully; in the end, I have to call it as I see it, and say I
intuitively believe a 'typical' Strong AI, under these 'toy universe'
conditions, would (unlike humans) allow SA to sway its actions.



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:58 MDT