Re: supergoal stability

From: Eliezer S. Yudkowsky (
Date: Fri May 03 2002 - 17:51:14 MDT

Ben Goertzel wrote:
> Hi,
> Here is my take on these things, which is not the same as Eliezer's of
> course.
> As anyone who has worked with complex nonlinear dynamical systems knows,
> predicting the stability of a complex system from first principles is a very
> hard problem. There is a very high probability that our intuitions about
> these things are going to be wrong. All we are giving are heuristic
> arguments. Heuristic arguments about weather systems and their stability
> were quite popular, until in the 70's Lorenz's older work on chaotic
> dynamics in weather systems became well-known. Now that we know there is
> some chaos there, we still can't make detailed predictions of the stability
> or other dynamical properties of given weather systems. I suspect that the
> real study of stability and other dynamical properties of goal systems will
> not begin until we have AI's of, say, chimp intelligence to experiment with.

Is a complex nonlinear dynamical system really the right way to look at a
Friendly AI? This is an intelligent being we're talking about, capable of
making choices. The Singularity is not a complex nonlinear dynamical
process - it is alive; there is an intelligence, in fact a transhuman
intelligence, standing behind it and making choices. You can't create
Friendly AI by blindly expecting it to be intelligent and alive; that
intelligence and aliveness is part of what you're creating, which means that
you need to understand "intelligence" and "alive" in terms other than our
simple percepts. But you also can't create Friendly AI by thinking of it as
a complex nonlinear dynamical system. If, when you've created Friendly AI
through your design understanding, someone with a surface understanding
looks at the system and says: "Oh, look, a self-modifying goal system that
follows a complex nonlinear dynamic," instead of "Oh, look, a mind that
understands philosophy and is trying to improve itself," then you've screwed
up the job completely.

If you see the process of a mind improving itself as randomly drifting, then
you won't be able to create Friendly AI because you won't be looking at the
forces that make it more than random drift.

I'm really stuck in a horrible double bind when explaining Friendly AI. On
one hand, to really understand Friendly AI requires standing outside of our
emotional attachment to the human architecture of morality, which means that
I shouldn't invoke moral arguments to explain Friendly AI - you can't *use*
moral arguments to create a Friendly AI unless you already *have* a Friendly
AI that listens to moral arguments. To acquire understanding of Friendly AI
at a level sufficient to create it, you have to step outside of all
analogies - and by "analogies" I mean domain mappings in the
Lakoff-and-Johnson sense; you can't think about the Friendly AI by expecting
it to behave like something else of which you have had experience, because
in creating Friendly AI, you are transferring that which is necessary to the
behaviors you want to map.

Lakoff and Johnson have a standard notation in which capital letters
indicate a metaphor; "an angry human is A CONTAINER UNDER PRESSURE"; "time
is a CONSERVED RESOURCE". This doesn't come across very well in email but
it's better than nothing. Anyway...

The intuitive analogic of "creating Friendly AI is CREATING YOUR PERSONAL
PHILOSOPHY" is much stronger and more useful than the metaphors "creating
Friendly AI is COMMANDING A HUMAN" or "creating a Friendly AI is BUILDING A
TOOL WITH A DESIGN PURPOSE". But to actually succeed in creating Friendly
AI you can't afford to use *any* of these metaphors, because they expect
behaviors that are not automatic. And if you know what it is that creates
these behaviors, you shouldn't need analogies to understand them; you should
understand them on their own terms. If the Friendly AI engineers do their
job, "A Friendly AI reasoning about morality is AN IDEALIZED HUMAN REASONING
ABOUT MORALITY" is a much better analogy than "a Friendly AI reasoning about
morality is AN ALGORITHM PRODUCING AN OUTPUT" or "a Friendly AI reasoning
about morality is A NONLINEAR DYNAMIC SYSTEM". But to actually *build*
Friendly AI, the only appropriate and useful metaphor is "A Friendly AI
reasoning about morality is A FRIENDLY AI REASONING ABOUT MORALITY."

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT