From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Jun 26 2002 - 11:29:42 MDT
Ben Goertzel wrote:
 >> "The intention that was in the mind of the programmer when writing this
 >> line of code" is a real, external referent; a human can understand it,
 >> and an AI that models causal systems and other agents should be able to
 >> understand it as well. Not just the image of Friendliness itself, but
 >> the entire philosophical model underlying the goal system, can be
 >> defined in terms of things that exist outside the AI and are subject to
 >> discovery.
 >
 > That is whacky!!
 >
 > Inferring an intention in another human's mind is HARD... except in very
 > simple cases, which are the exception, not the rule...
I didn't say it was easy for an infrahuman AI.  I said that the sentence 
itself was an external reference.  This has implications for the AI's 
modeling of Friendliness regardless of whether the AI knows enough to unpack 
the external reference.  If X is an external reference to a design decision 
that is an external reference to the programmers' intentions, then it makes 
sense to change X if the programmers change their design decision as a 
result of seeing the previous version of X function in a way that conflicted 
with their intentions.  It's a question of where your roots are.  What is 
important to an early AI is not so much knowing *what* the roots are as 
knowing *where* the roots are.  Growing up and becoming independent of the 
programmers is something that depends on understanding *what* the roots are, 
but this is an advanced stage of Friendship growth, which may assume that 
the AI understands the programmers as black boxes.
External references can never be completely unpacked in a messy physical 
universe; even a femtotech-level scan of your neural state a few years later 
may be insufficient to capture the full physical description of your mind at 
the moment of decision a few years back.  (Of course, all the *relevant* 
factors in the decision are presumably a bit easier to figure out than this, 
since similar factors will appear as relevant in other decisions as well.) 
The question is just whether it can be unpacked well enough to capture the 
information that's relevant to Friendliness; if a human notices anything 
amiss, that means the difference is noticeable enough to be distinguishable 
at our level of reality, and an SI should have no trouble picking it out. 
For that matter, an infrahuman AI can notice pretty subtle mistakes by proxy 
- if a human objects to something, then the actual objection typed into the 
keyboard is a very noticeable physical effect of which the mistake is the cause.
 > It's hard for a human, even one with decent empathic ability.
 >
 > It may be WAY WAY HARDER for a nonhuman intelligence, even one somewhat
 > smarter than humans. (Just as, for example, a dog may be better at
 > psyching out "the intention in another dog's head" than I am, because
 > it's a dog...)
 >
 > I am very curious to see your design for your AGI's "telepathy" module ;)
Initially an AI doesn't need to know *what* a programmer is, just *where* a 
programmer is; as the AI grows up mentally, it will find out *what* a 
programmer is; and as the AI finds out *what* a programmer is, it will grow 
up morally.
 > What, Eliezer, was the intention in my mind as I wrote this e-mail?   I
 > don't even know, fully!  There are many of them, overlapping; sorting
 > them out would take me ten minutes and would be a badly error-prone
 > process...
Can I answer you after the Singularity?
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT