Re: Self-modifying FAI (was: How hard a Singularity?)

From: Eliezer S. Yudkowsky (
Date: Wed Jun 26 2002 - 11:29:42 MDT

Ben Goertzel wrote:
>> "The intention that was in the mind of the programmer when writing this
>> line of code" is a real, external referent; a human can understand it,
>> and an AI that models causal systems and other agents should be able to
>> understand it as well. Not just the image of Friendliness itself, but
>> the entire philosophical model underlying the goal system, can be
>> defined in terms of things that exist outside the AI and are subject to
>> discovery.
> That is whacky!!
> Inferring an intention in another human's mind is HARD... except in very
> simple cases, which are the exception, not the rule...

I didn't say it was easy for an infrahuman AI. I said that the sentence
itself was an external reference. This has implications for the AI's
modeling of Friendliness regardless of whether the AI knows enough to unpack
the external reference. If X is an external reference to a design decision
that is an external reference to the programmers' intentions, then it makes
sense to change X if the programmers change their design decision as a
result of seeing the previous version of X function in a way that conflicted
with their intentions. It's a question of where your roots are. What is
important to an early AI is not so much knowing *what* the roots are as
knowing *where* the roots are. Growing up and becoming independent of the
programmers is something that depends on understanding *what* the roots are,
but this is an advanced stage of Friendship growth, which may assume that
the AI understands the programmers as black boxes.

External references can never be completely unpacked in a messy physical
universe; even a femtotech-level scan of your neural state a few years later
may be insufficient to capture the full physical description of your mind at
the moment of decision a few years back. (Of course, all the *relevant*
factors in the decision are presumably a bit easier to figure out than this,
since similar factors will appear as relevant in other decisions as well.)
The question is just whether it can be unpacked well enough to capture the
information that's relevant to Friendliness; if a human notices anything
amiss, that means the difference is noticeable enough to be distinguishable
at our level of reality, and an SI should have no trouble picking it out.
For that matter, an infrahuman AI can notice pretty subtle mistakes by proxy
- if a human objects to something, then the actual objection typed into the
keyboard is a very noticeable physical effect of which the mistake is the cause.

> It's hard for a human, even one with decent empathic ability.
> It may be WAY WAY HARDER for a nonhuman intelligence, even one somewhat
> smarter than humans. (Just as, for example, a dog may be better at
> psyching out "the intention in another dog's head" than I am, because
> it's a dog...)
> I am very curious to see your design for your AGI's "telepathy" module ;)

Initially an AI doesn't need to know *what* a programmer is, just *where* a
programmer is; as the AI grows up mentally, it will find out *what* a
programmer is; and as the AI finds out *what* a programmer is, it will grow
up morally.

> What, Eliezer, was the intention in my mind as I wrote this e-mail? I
> don't even know, fully! There are many of them, overlapping; sorting
> them out would take me ten minutes and would be a badly error-prone
> process...

Can I answer you after the Singularity?

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT