From: Eliezer Yudkowsky (sentience@pobox.com)
Date: Sun May 30 2004 - 10:46:29 MDT
Mark Waser wrote:
>>I am at a loss to understand what is gained, except a way of
>>anthropomorphizing the issue and avoiding confrontation with that darned
>>hard part of the problem.
> 
> OK.  Try this scenario . . . . An FAI believes that it has a good goal
> enhancement.  It tests it.  It looks OK.  It implements it.  Circumstances
> subsequently change in a REALLY unexpected manner.  Oops!  Under these very
> odd new circumstances, the "enhancement" turns out to be catastrophic for
> the human race . . . .
> 
> But wait!  There are three other FAIs that are in close communication with
> this FAI.  Friendliness dictates that they should work in close co-operation
> in order to prevent errors of this type.  The other FAIs have not made this
> particular "enhancement" and correctly evaluate the situation for what it
> is.  They outvote the FAI with the "enhancement" and the "enhanced" FAI (who
> is still mostly Friendly and merely mistaken) rolls the "enhancement" back
> (or modifies it) so that the human race lives for the next moment or so
> 
> Good engineering often dictates redundancy.  Common sense (which ain't so
> common - yes, I know) strongly promotes checks and balances.  Human history
> shows that when diversity of opinion is allowed to flourish that good things
> happen and that when diversity is suppressed that BAD things happen.  You
> seem to be flying in the face of a lot of good consensus about safety
> measures without a reason except "I am at a loss to understand what is
> gained
I'm glad you say that I "seem to be" flying in the face of safety "without 
a reason".  Ordinarily I wouldn't have time to write this up.  But on this 
occasion, you're in luck; I've got an explanation of this from a piece of 
private correspondence that I wrote to Eric Drexler on May 9th, 2003.  Of 
course, no one saw this, so they might think I was "totally isolated".
** begin quote **
I tend to regard the sharp distinction between "groups" and "individuals" 
as a special case of the human way, where cognitive systems cannot 
agglomerate, and the goal systems contain numerous egocentric 
speaker-dependent variables.  In particular, I worry that our being adapted 
to the individuals-groups dichotomy, and our expectation that other 
individuals will exhibit those same adaptations, can lead to incorrect 
inferences when considering AIs.  It looks to me like the challenge of 
getting a cognitive system to compute morality remains the same regardless 
of whether that physical system is called a multiplicity or a singleton - 
whether we regard it as many minds or one mind, it's the same thing.
Imagine that some human culture has, for some time, employed human runners 
as messengers.  A messenger might fail for any number of reasons, including 
knees giving out, heart attacks, stomach pains, and all the ills to which 
human messengers are heir.  Suppose that each individual is a complex 
system composed of N subsystems, each with chance P1, P2, P3 of failure. 
Failure of any one subsystem causes the messenger to stumble to the side of 
the road, gasping, and the message won't go through.  The total chance of a 
messenger succeeding is (1 - P1)(1 - P2)(1 - P3)...  So after a while, this 
culture learns a very strong culturally embedded rule that you never send 
*one* messenger with a very urgent message.  It may even be instinctive - 
they may use different innate cognitive rules for modeling one messenger 
and many messengers.
Now along comes a transhuman messenger with the strange ability to 
agglomerate subsystems with other messengers.  In other words, if you have 
two transmessengers, they can agglomerate into one messenger with redundant 
subsystems.  The agglomerated messenger's total chance of success is now (1 
- P12)(1 - P22)(1 - P32)... which is a considerable improvement over the 
chance of success of two independent messengers.  It takes two independent 
failures *in the same place* to destroy the agglomerated runner.  The 
agglomerated runner is probably quite a lot more reliable than a dozen 
independent runners, and cannot possibly be worse - any set of failures 
that would destroy the agglomerated runner would also destroy two single 
runners, and many failure sets that would destroy independent runners may 
leave an agglomerated runner untouched.  An agglomerated quadruplet of 
messengers may be more reliable than a thousand runners.  Yet the society 
may still insist that the runners not agglomerate because of the 
instinctive force of the rule not to trust "one runner".  Or to put it more 
pessimistically, if a course is so difficult that it can't be solved using 
an agglomerated runner, splitting up the runner into individuals can't get 
you any closer to solving the problem.
Humans have to imagine ways of solving problems that use many humans, 
because in the human world there is no way to use *one* *big* human.
Suppose you have two "thermostat AIs" - that is, they have a decision 
system that employs a very simple and nonhumane way of computing 
desirability.  Let's say that one AI cares only about paperclips, while the 
other cares only about staples.  If the two AIs are roughly equal, they 
might arrive at something resembling a cooperative game-theoretical 
solution and split up the solar system between them, this solution being 
preferable to the negative effects of hostilities - classic PD.  The 
problem is that this doesn't protect the humans - it is better for *both* 
AIs to split any resources currently used by humanity between them.
If humans make a law to live among each other in peace, our intuitions for 
fairness are likely to lead us to extend that law widely.  Not necessarily 
as widely as we should - witness the amount of time it took some groups to 
get the vote, for example.  But the point is that we regard the law as 
representing certain ideals of fairness, and the reason for this is that we 
have a fairness adaptation which came out of our ancestors being placed in 
game-theoretical situations.  But that adaptation is independent of the 
conditions that produced it, just as our current taste for sugar and fat is 
independent of the scarce-calorie environments which produced it.  And the 
idea that you *have* an adaptation for handling the problem is itself a 
product of the evolutionary design method.  "Individual organisms should be 
considered adaptation-executers rather than fitness-maximizers".  We don't 
recompute the exact number of calories and nutrients we need; we have a 
simple adaptation that leads us to prefer certain foods over others.  It is 
not an adapation that is there so that we *will* reproduce; it is an 
adaptation that is there because our ancestors with those genes *did* 
reproduce.
We have, as an independently executing adaptation, a certain *simple* way 
of handling game-theoretical situations - an instinctive sense of honor, 
fairness, and reciprocity.  We have this simple executing adaptation 
because it's a solution that was accessible to an incrementally adaptive 
design pathway.
But that only holds for evolved organisms, not nonhumane thermostat AIs, 
which really would maximize 'fitness' according to their simple 
desirability computations.
A number N of thermostat AIs may find themselves in a game-theoretical 
environment which looks to a human like it should call forth a fair 
solution, like one law for all sentients.  What seems more likely to happen 
is that the AIs will, on the fly, compute a cooperative solution among 
themselves which leaves humanity in the cold.  They would not have the 
adaptation for an instinctive sense of fairness that would lead them to see 
something wrong about this; they would simply calculate a "fair" solution 
in which the universe was equally divided among paperclips and staples. 
 From our perspective, this might not look very different from a single AI 
with a goal system that wanted both paperclips and staples.
If you imagine that humans, in some unimaginable way, acquire a serious 
threat to hold over both superintelligences, demanding that the humans be 
treated as game-theoretical near-equals, then why wouldn't the same threat 
be holdable over a paperclips-and-staples singleton?  This is what I mean 
by the apparent equivalence of groups and individuals from our perspective.
It looks to me like it takes work to compute a humane morality, work which 
does not emerge automatically in either groups or singletons.  If there's a 
group solution I would expect there to exist a corresponding singleton 
solution.  Even if human morality is inherently group-based, the Friendly 
AI structure looks like it should work to embody our group morality in a 
single AI!
Multiplicities appeal to our human intuitions for fairness, and may indeed 
have some effectiveness at evoking fairer solutions from groups of humans, 
but it's essentially a way of *evoking* humaneness that's *already there*. 
  You have the problem of getting a number N of AIs to treat humans 
humanely even if humans may not be their game-theoretical equals, and this 
looks like pretty much the same problem whether N=1000 or N=1.  You need an 
AI that is humane and values sentient life for its own sake.  If N AIs 
don't value sentient life, they may negotiate cooperation among themselves, 
but there'd be no reason to extend the compact to include humans - from our 
perspective the problem of getting a physical system to compute morality is 
just the same whether we regard the physical system of N AIs as a single 
object or a group of objects.
** end of quote **
> Why don't we assume that I'm a Friendly human telling you that I'm pretty
> sure that a single point of failure is A REALLY BAD IDEA(tm).  I would hope
> that you would take this seriously enough that you wouldn't ignore this
> advice and implement your plan solely on the basis of "I am at a loss to
> understand what is gained . . . ."
> 
> By the way, I do understand that you don't acknowledge the distinction
> between multiple AIs with close communication and one AI with partitioning
> but I would submit that one AI with sufficient partitioning SHOULD BE
> considered separate AIs for all intents and purposes.  Or, if the
> partitioning is not sufficient for them, to be considered separate AIs, then
> you need more partitioning in your single AI to create multiple AIs to
> prevent the problem above.
I am in the middle of working out seriously complicated stuff that I am too 
busy reworking to properly explain.  Sometimes I will be able to explain my 
reasons.  Sometimes not.  I am getting more and more nervous about time.
Neighborly human or not, it is not a trivial task to give a specialist 
fresh advice in his own field.  Figure on the attempt failing at least 95% 
of the time.  Most of the time, I will have long since thought through 
everything that occurred to you, in advance, whether that is readily 
apparent or not.  By all means keep trying, but if I say "I already thought 
of that," I did, whether I have time to explain or not.
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT