Re: Self-modifying FAI (was: How hard a Singularity?)

From: James Higgins (
Date: Wed Jun 26 2002 - 09:43:48 MDT

At 07:29 AM 6/26/2002 -0400, Eliezer S. Yudkowsky wrote:
>Stephen Reed wrote:
>>On Tue, 25 Jun 2002, Ben Goertzel wrote:
>>>The hard part is: If one creates a system that is able to change its concept
>>>of Friendliness over time, and is able to change the way it governs its
>>>behavior based on "goals" over time, then how does one guarantee (with high
>>>probability) that Friendliness (in the designer's sense) persists through
>>>these changes.
>>I understand from CFAI that one grounds the concept of Friendliness in
>>external referents - that the Seed AI attempts to model with increasing
>>fidelity. So the evolving Seed AI becomes more friendly as it reads more,
>>experiments more and discovers more about what friendliness actually is.
>>For Cyc, friendliness would not be an implementation term (e.g. some piece
>>of code that can be replaced), but be a rich symbolic representation of
>>something in the real world to be sensed directly or indirectly.
>>So I regard the issue as one of properly educating the Seed AI as to what
>>constitutes unfriendly behavior and why not to do it - via external
>I agree with your response to Ben. We don't expect an AI's belief that
>the sky is blue to drift over successive rounds of
>self-modification. Beliefs with an external referent should not "drift"
>under self-modification except insofar as they "drift" into correspondence
>with reality. Write a definition of Friendliness made up of references to
>things which exist outside the AI, and the content has no reason to
>"drift". If content drifts it will begin making incorrect predictions and
>will be corrected by further learning.

Unfortunately, can we construct a definition of friendliness using external
reference points which truly equals what we really want? Given much
greater knowledge and intelligence what we attribute to friendly behavior
may end up looking quite different.

Your definition of ethics is a good example. If an alien landed tomorrow
and the first person it met was a fantastic salesman, the salesman may
appear to be exceedingly friendly. When in fact their only goal is to open
up a new trade route and they don't in fact care one iota about the alien,
only the result! ;)

Have a look at all the problems in the Catholic church these days. 20
years ago (which much of it was occurring), do you think anyone would have
believed that reality?

We may *think* we are defining friendliness via external reference points
but actually be defining only the appearance of friendliness or something
similar. Thus the SI would only need to appear friendly to us, even while
it was planning to turn the planet into computing resources.

>Furthermore, programmers are physical objects and the intentions of
>programmers are real properties of those physical objects. "The intention
>that was in the mind of the programmer when writing this line of code" is
>a real, external referent; a human can understand it, and an AI that
>models causal systems and other agents should be able to understand it as
>well. Not just the image of Friendliness itself, but the entire
>philosophical model underlying the goal system, can be defined in terms of
>things that exist outside the AI and are subject to discovery.

A human can understand the words "The intention that was in the mind of the
programmer when writing this line of code", but they could never fully
UNDERSTAND it. This is why I think you need to have more real life
experience, Eliezer. Those of us that are married can easily understand
why the above is not possible. You can never FULLY understand what someone
else intends by something.

To use Eliezer's method, while I may not be correct I'm quite certain you
are wrong. (Does that make me an honorary Friendship Programmer?)

James Higgins

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT