Knowability of Friendly AI (was: ethics of argument)

From: Eliezer S. Yudkowsky (
Date: Mon Nov 11 2002 - 05:54:36 MST

Ben Goertzel wrote:
> Eliezer wrote:
>> You're arguing from fully general uncertainty again; can you give a
>> specific X in Friendly AI theory that you do not think it is possible
>> to usefully consider in advance?)
> There are many examples. One example is, the stability of AGI
> goal-systems under self-modification. To understand this at all in
> advance of having simple self-modifying AGI's to experiment with, one
> would need a tremendously, immeasurably more sophisticated mathematical
> dynamical systems theory than we now possess (or than seems feasible to
> create in the near term). Yet you seem to be making some very
> confident assertions in this regard in CFAI.

That's because I'm not viewing the problem as "the stability of AGI goal
systems under self-modification". Rather there are certain *particular
kinds* of self-modification which humans exhibit, which have particular
*meanings* in terms of human philosophy and morality, which need to be
embodied in a Friendly AI on purely moral grounds, and which also turn out
to play critical roles in describing both the empirical transfer process
and those moral considerations which, from a human perspective, govern the
creation of AI. "My supergoals could be wrong" is simultaneously a valid
part of human philosophical thinking, a consequence of the "external
reference semantics" needed for an AI to treat any supergoal-modifying
causal process (especially including feedback from the programmers) as
desirable, and a major consideration affecting the morality of AI creation


1) The structural dynamics described in CFAI have moral meaning in human
terms - they are not a theory of all possible self-modifying goal systems
but a specific theory of a moral Friendly AI.

2) The structural dynamics described in CFAI are implemented in humans
and can be examined there.

3) What matters is not (A) "Are AGI goal systems stable under
self-modification?" but (B) "Can AGI goal systems be at least as "stable"
under self-modification as an evolved human, where "stability" is defined
in a specific and morally relevant way?"

I'd also ask you to consider narrowing your focus from the extremely
general issue of "the stability of self-modifying goal systems" to
statements of the order found in CFAI, such as "A goal system with
external reference semantics and probabilistic supergoals exhibits certain
behaviors that are morally relevant to Friendly AI and necessary to
Friendly AI construction, and is therefore a superior design choice by
comparison with more commonly proposed goal system structures under which
supergoals are treated as correct by definition." Why do you believe
that, e.g., this specific question cannot be considered in advance?

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:41 MDT