Re: guaranteeing friendliness

From: Eliezer S. Yudkowsky (
Date: Sat Dec 03 2005 - 19:07:22 MST

Herb Martin wrote:
> From: Michael Wilson
>>You're making a beginner mistake: you're confusing the ability
>>to predict what an intelligence will /do/, with the ability
>>to predict what it will /desire/. If we could predict exactly
>>what an AGI will actually do then it wouldn't have transhuman
>>intelligence. Fortunately predicting what the goals of an
>>AGI system will be, including the effects of self-modification,
>>is a much more tractable (though very hard) endeavour.
> Other than making the claim that I am a beginner, or
> making beginner mistakes, without providing evidence
> or logic you have above claimed that 'actions' cannot
> be predicted while claiming that 'desires' can be;
> again with evidence or logic.

I agree that Wilson's paragraph is not a sufficient explanation.

Consider the concept of an optimization process: a system which hits
small targets in large search spaces to produce coherent real-world effects.

An optimization process steers the future into particular regions of the
possible. I am visiting a distant city, and a local friend volunteers
to drive me to the airport. I do not know the neighborhood. When my
friend comes to a street intersection, I am at a loss to predict my
friend's turns, either individually or in sequence. Yet I can predict
the result of my friend's unpredictable actions: we will arrive at the
airport. Even if my friend's house were located elsewhere in the city,
so that my friend made a wholly different sequence of turns, I would
just as confidently predict our destination. Is this not a strange
situation to be in, scientifically speaking? I can predict the outcome
of a process, without being able to predict any of the intermediate
steps in the process. I will speak of the region into which an
optimization process steers the future as that optimizer's target.

Consider a car, say a Toyota Corolla. Of all possible configurations
for the atoms making up the Corolla, only an infinitesimal fraction
qualify as a useful working car. If you assembled molecules at random,
many many ages of the universe would pass before you hit on a car. A
tiny fraction of the design space does describe vehicles that we would
recognize as faster, more efficient, and safer than the Corolla. Thus
the Corolla is not optimal under the designer's goals. The Corolla is,
however, optimized, because the designer had to hit a comparatively
infinitesimal target in design space just to create a working car, let
alone a car of the Corolla's quality. You cannot build so much as an
effective wagon by sawing boards randomly and nailing according to
coinflips. To hit such a tiny target in configuration space requires a
powerful optimization process.

The notion of an "optimization process" is predictively useful because
it can be easier to understand the target of an optimization process
than to understand its step-by-step dynamics. The above discussion of
the Corolla assumes implicitly that the designer of the Corolla was
trying to produce a "vehicle", a means of travel. This assumption
deserves to be made explicit, but it is not wrong.

If I play chess against a stronger player, I cannot predict exactly
where my opponent will move against me - if I could predict that, I
would necessarily be at least that strong at chess myself. But I can
predict the end result, which is a win for the other player. When I am
at my most creative, that is when it is hardest to predict my actions,
and easiest to predict the consequences of my actions - if you know and
understand my goals.

This strange balance sometimes confuses people, so that they conflate
creativity with randomness and begin to praise entropy. Sometimes the
chess master makes a move which surprises me completely - and if so, I'm
probably about to get wiped off the board. On one level, I was very
much surprised by the action; on a higher level, the outcome became that
much more predictable. There is no creative surprise without some
criterion that makes it surprisingly good. One can easily reduce my
ability to predict an opponent's moves by substituting a random move
generator; but a random opponent will play poor chess. Randomness is
not at all the same as creativity.

Suppose you want an exceptionally good answer, better than you yourself
could invent. Then it may make sense to delegate the question to a
physical process located outside your brain, configured by you to search
for a solution. If so, you sacrifice your ability to predict the exact
solution in advance of searching - if you could exactly predict the
answer, there would be no point in delegating the search. This applies
on any level where, by your lights, there are better answers and worse
ones. But you need not sacrifice your ability to say what is the
question. It may be an extremely high-level question, but if you don't
have something in mind, you might as well program the search process to
return a random answer. Maximum entropy is not the same as maximum

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:54 MDT