Re: One or more AIs??

From: Eliezer Yudkowsky (sentience@pobox.com)
Date: Sun May 30 2004 - 10:46:29 MDT


Mark Waser wrote:
>>I am at a loss to understand what is gained, except a way of
>>anthropomorphizing the issue and avoiding confrontation with that darned
>>hard part of the problem.
>
> OK. Try this scenario . . . . An FAI believes that it has a good goal
> enhancement. It tests it. It looks OK. It implements it. Circumstances
> subsequently change in a REALLY unexpected manner. Oops! Under these very
> odd new circumstances, the "enhancement" turns out to be catastrophic for
> the human race . . . .
>
> But wait! There are three other FAIs that are in close communication with
> this FAI. Friendliness dictates that they should work in close co-operation
> in order to prevent errors of this type. The other FAIs have not made this
> particular "enhancement" and correctly evaluate the situation for what it
> is. They outvote the FAI with the "enhancement" and the "enhanced" FAI (who
> is still mostly Friendly and merely mistaken) rolls the "enhancement" back
> (or modifies it) so that the human race lives for the next moment or so
>
> Good engineering often dictates redundancy. Common sense (which ain't so
> common - yes, I know) strongly promotes checks and balances. Human history
> shows that when diversity of opinion is allowed to flourish that good things
> happen and that when diversity is suppressed that BAD things happen. You
> seem to be flying in the face of a lot of good consensus about safety
> measures without a reason except "I am at a loss to understand what is
> gained

I'm glad you say that I "seem to be" flying in the face of safety "without
a reason". Ordinarily I wouldn't have time to write this up. But on this
occasion, you're in luck; I've got an explanation of this from a piece of
private correspondence that I wrote to Eric Drexler on May 9th, 2003. Of
course, no one saw this, so they might think I was "totally isolated".

** begin quote **

I tend to regard the sharp distinction between "groups" and "individuals"
as a special case of the human way, where cognitive systems cannot
agglomerate, and the goal systems contain numerous egocentric
speaker-dependent variables. In particular, I worry that our being adapted
to the individuals-groups dichotomy, and our expectation that other
individuals will exhibit those same adaptations, can lead to incorrect
inferences when considering AIs. It looks to me like the challenge of
getting a cognitive system to compute morality remains the same regardless
of whether that physical system is called a multiplicity or a singleton -
whether we regard it as many minds or one mind, it's the same thing.

Imagine that some human culture has, for some time, employed human runners
as messengers. A messenger might fail for any number of reasons, including
knees giving out, heart attacks, stomach pains, and all the ills to which
human messengers are heir. Suppose that each individual is a complex
system composed of N subsystems, each with chance P1, P2, P3 of failure.
Failure of any one subsystem causes the messenger to stumble to the side of
the road, gasping, and the message won't go through. The total chance of a
messenger succeeding is (1 - P1)(1 - P2)(1 - P3)... So after a while, this
culture learns a very strong culturally embedded rule that you never send
*one* messenger with a very urgent message. It may even be instinctive -
they may use different innate cognitive rules for modeling one messenger
and many messengers.

Now along comes a transhuman messenger with the strange ability to
agglomerate subsystems with other messengers. In other words, if you have
two transmessengers, they can agglomerate into one messenger with redundant
subsystems. The agglomerated messenger's total chance of success is now (1
- P12)(1 - P22)(1 - P32)... which is a considerable improvement over the
chance of success of two independent messengers. It takes two independent
failures *in the same place* to destroy the agglomerated runner. The
agglomerated runner is probably quite a lot more reliable than a dozen
independent runners, and cannot possibly be worse - any set of failures
that would destroy the agglomerated runner would also destroy two single
runners, and many failure sets that would destroy independent runners may
leave an agglomerated runner untouched. An agglomerated quadruplet of
messengers may be more reliable than a thousand runners. Yet the society
may still insist that the runners not agglomerate because of the
instinctive force of the rule not to trust "one runner". Or to put it more
pessimistically, if a course is so difficult that it can't be solved using
an agglomerated runner, splitting up the runner into individuals can't get
you any closer to solving the problem.

Humans have to imagine ways of solving problems that use many humans,
because in the human world there is no way to use *one* *big* human.

Suppose you have two "thermostat AIs" - that is, they have a decision
system that employs a very simple and nonhumane way of computing
desirability. Let's say that one AI cares only about paperclips, while the
other cares only about staples. If the two AIs are roughly equal, they
might arrive at something resembling a cooperative game-theoretical
solution and split up the solar system between them, this solution being
preferable to the negative effects of hostilities - classic PD. The
problem is that this doesn't protect the humans - it is better for *both*
AIs to split any resources currently used by humanity between them.

If humans make a law to live among each other in peace, our intuitions for
fairness are likely to lead us to extend that law widely. Not necessarily
as widely as we should - witness the amount of time it took some groups to
get the vote, for example. But the point is that we regard the law as
representing certain ideals of fairness, and the reason for this is that we
have a fairness adaptation which came out of our ancestors being placed in
game-theoretical situations. But that adaptation is independent of the
conditions that produced it, just as our current taste for sugar and fat is
independent of the scarce-calorie environments which produced it. And the
idea that you *have* an adaptation for handling the problem is itself a
product of the evolutionary design method. "Individual organisms should be
considered adaptation-executers rather than fitness-maximizers". We don't
recompute the exact number of calories and nutrients we need; we have a
simple adaptation that leads us to prefer certain foods over others. It is
not an adapation that is there so that we *will* reproduce; it is an
adaptation that is there because our ancestors with those genes *did*
reproduce.

We have, as an independently executing adaptation, a certain *simple* way
of handling game-theoretical situations - an instinctive sense of honor,
fairness, and reciprocity. We have this simple executing adaptation
because it's a solution that was accessible to an incrementally adaptive
design pathway.

But that only holds for evolved organisms, not nonhumane thermostat AIs,
which really would maximize 'fitness' according to their simple
desirability computations.

A number N of thermostat AIs may find themselves in a game-theoretical
environment which looks to a human like it should call forth a fair
solution, like one law for all sentients. What seems more likely to happen
is that the AIs will, on the fly, compute a cooperative solution among
themselves which leaves humanity in the cold. They would not have the
adaptation for an instinctive sense of fairness that would lead them to see
something wrong about this; they would simply calculate a "fair" solution
in which the universe was equally divided among paperclips and staples.
 From our perspective, this might not look very different from a single AI
with a goal system that wanted both paperclips and staples.

If you imagine that humans, in some unimaginable way, acquire a serious
threat to hold over both superintelligences, demanding that the humans be
treated as game-theoretical near-equals, then why wouldn't the same threat
be holdable over a paperclips-and-staples singleton? This is what I mean
by the apparent equivalence of groups and individuals from our perspective.

It looks to me like it takes work to compute a humane morality, work which
does not emerge automatically in either groups or singletons. If there's a
group solution I would expect there to exist a corresponding singleton
solution. Even if human morality is inherently group-based, the Friendly
AI structure looks like it should work to embody our group morality in a
single AI!

Multiplicities appeal to our human intuitions for fairness, and may indeed
have some effectiveness at evoking fairer solutions from groups of humans,
but it's essentially a way of *evoking* humaneness that's *already there*.
  You have the problem of getting a number N of AIs to treat humans
humanely even if humans may not be their game-theoretical equals, and this
looks like pretty much the same problem whether N=1000 or N=1. You need an
AI that is humane and values sentient life for its own sake. If N AIs
don't value sentient life, they may negotiate cooperation among themselves,
but there'd be no reason to extend the compact to include humans - from our
perspective the problem of getting a physical system to compute morality is
just the same whether we regard the physical system of N AIs as a single
object or a group of objects.

** end of quote **

> Why don't we assume that I'm a Friendly human telling you that I'm pretty
> sure that a single point of failure is A REALLY BAD IDEA(tm). I would hope
> that you would take this seriously enough that you wouldn't ignore this
> advice and implement your plan solely on the basis of "I am at a loss to
> understand what is gained . . . ."
>
> By the way, I do understand that you don't acknowledge the distinction
> between multiple AIs with close communication and one AI with partitioning
> but I would submit that one AI with sufficient partitioning SHOULD BE
> considered separate AIs for all intents and purposes. Or, if the
> partitioning is not sufficient for them, to be considered separate AIs, then
> you need more partitioning in your single AI to create multiple AIs to
> prevent the problem above.

I am in the middle of working out seriously complicated stuff that I am too
busy reworking to properly explain. Sometimes I will be able to explain my
reasons. Sometimes not. I am getting more and more nervous about time.

Neighborly human or not, it is not a trivial task to give a specialist
fresh advice in his own field. Figure on the attempt failing at least 95%
of the time. Most of the time, I will have long since thought through
everything that occurred to you, in advance, whether that is readily
apparent or not. By all means keep trying, but if I say "I already thought
of that," I did, whether I have time to explain or not.

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT