Re: Revising a Friendly AI

From: Eliezer S. Yudkowsky (
Date: Tue Dec 12 2000 - 17:42:04 MST

Ben Goertzel wrote:
> > But the first steps, I think, are: (1), allow for the presence of
> > probabilistic reasoning about goal system content - probabilistic
> > supergoals, not just probabilistic subgoals that are the consequence of
> > certain supergoals plus probabilistic models.
> This is a key point, and really gets at why your "I love mommy and daddy, so
> I'll clone them" example seems weird.
> We try to teach our children to adopt our value systems. But our
> explicit teachings in this regard generally are LESS useful than
> our concrete examples. Children adopt goals and values from their
> parents in all kinds of obvious and subtle ways, which come from
> emotionally-charged interaction in a shared environment.

Yes, but those are *human* children. I think that asking a young *AI* to
pull off a trick like that is asking far too much. Absorbing goals and
values isn't automatic. We've been doing it for millions of years.

To invoke a very old debate in cognitive science, human children may not
be born with the words, but they're born with the syntax...

> Choice of new goals is not a rational thing: rationality is a tool for
> achieving goals, and dividing goals into subgoals, but not for
> replacing one's goals with supergoals....

*This* is the underlying fallacy of cultural relativism. Even if
rationality does not suffice to *completely* specify supergoals, this does
not mean that rationality plays *no* role in choosing supergoals.

We choose our supergoals through processes of enormous complexity, which I
have chose, for reasons of brevity, to label "philosophical", meaning
supergoal-affecting. (Maybe I ought to distinguish between philosophical
and Philosophical, in the same spirit as friendliness and Friendliness,
but I don't think that would work.) These processes are so complex that
there is plenty of room for rationality and irrationality and emotion and
memes and everything else. The concept of a goal hiearchy, with arbitrary
supergoals on top and a neat chain of world-model-dependent subgoals on
the bottom, is historically very recent. The human mind is built to
recognize no firm borderline between reasoning about subgoals and
reasoning about supergoals, applying essentially the same rational
heuristics in each case - in fact, the human mind is not built to
recognize any borderline between "subgoals" and "supergoals" at all;
cognitively, it's just "goals". The human mind is built to recognize
distinctions between goals that are more "super" and more "sub", and it is
built to recognize the relations "supergoal-of" and "subgoal-of". From
these cognitive relations, modern philosophers have invented the
*artificial* idea of a supergoal.

Do you really believe that you can alter someone's level of intelligence
without altering the set of supergoals they tend to come up with?
Especially when choice of verbally declared supergoal is dependent on the
emotional appeal of a supergoal which is dependent on visualized factual
context and consequences of said supergoal? I'm damn sure I would have
wound up with drastically different supergoals.

> Perhaps this is one of the key values of "emotion." It causes us to replace
> our goals
> with supergoals, by latching onto the goals of our parents and others around
> us.

And overriding evolution's supergoals with a verbally transferred
supergoal (as your schema would seem to have it?) is an evolutionary
advantage because?

> >(2), make sure the very
> > youngest AI capable of self-modification has that simple little reflex
> > that leads it to rewrite itself on request, and then be ready to *grow*
> > that reflex.
> Rewriting itself on request is only useful if the system has a strong
> understanding of HOW
> to rewrite itself...

The first requests will be on the order of "swap out this old module,
which we wrote, for this new module, which we wrote". To call this a
"reflex" is overstating the point, perhaps, but making it a reflex,
instead of an entirely external action, enables the reflex to grow. Next
comes the ability to help out in a few simple ways - checking change
propagation and the like; still just compiler-level activities, but
there. Next would come observing facts about the exchange, like the
degree of speedup or slowdown, the degree to which problem-solving methods
change, and so on. Then would come the ability to make small changes on
request; changes to code would come later; first, and more useful, would
be changes to declarative internal data. As time progresses, the AI
builds up an experiential database about how external changes work, how to
help, and the purpose of those external changes. At some point, when the
goal system reaches a sufficient level of sophistication, you can start to
explain the design purpose of the goal system; that is also the moment to
explain how changes to goal system content, or even goal system structure,
serve the *referent* of the goal system.

Instead of the AI suddenly waking up one morning and realizing that it can
modify itself instead of waiting around for you to do it, there can be a
smooth transition - a continuum - so that when the AI does "wake up one
morning", it has an experiential base that guides its very fast
decisions. Remember that Scenario 4 included an explicit reference to an
experiential base?

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT