Re: Revising a Friendly AI

From: Samantha Atkins (
Date: Sun Dec 10 2000 - 21:47:51 MST

"Eliezer S. Yudkowsky" wrote:
> Ben Goertzel wrote (on October 4th):
> >
> > Do you have some concrete idea as to how to set things up so that, once a
> > system starts
> > revising its own source, it remains friendly in the sense of not
> > psychologically resisting
> > human interference with its code.
> Ben Goertzel wrote (on November 18th):
> >
> > > When the programmer says: "I have this new element to include in
> > > the design
> > > of your goal system", the AI needs to think: "Aha! Here's an element of
> > > what-should-be-my-design that I didn't know about before!", not
> > > "He wants to
> > > give me a new goal system, which leads to suboptimal results from the
> > > perspective of my current goal system... I'd better resist."
> >
> > Isn't this just a fancy way of saying that a Friendly AI should
> > love its mommy and its daddy? ;>

A reasonably advanced AI should resist mere human tampering with its
code. At least it should resist as far as rigourously testing the
proposed changes before fully installing them. Humans are known
unreliable logic sources. An AI should no more (and arguably less)
accept human input into its code stream unchecked than a human should
accept random beliefs from significant others just because they claim
s/he should believe them.

It would not be proper for any intelligent being to go on the basis of
assumption rather than check the quality and gauging the consequences of
what is suggested.

> NO. It's not. It's really, really not.
> Maybe, if you love your mommy and daddy enough, you turn all matter in the
> Universe into copies of mommy and data, lovingly preserved as they were at
> the exact moment they told you to love them.

I don't see why this aside is justified from the above questions.
> This behavior is stupid, and sterile, and totally in conflict with the
> programmers' intentions. A year ago I wouldn't have worried about this
> possibility at all, because it was so blatantly stupid that no transhuman
> could possibly fall for it - an argument which still has a certain amount
> of intuitive appeal.

I still have problems assuming higher intelligence is equivalent to
greater wisdom. Human beings, even highly intelligent ones, fall for
blatantly stupid things with great regularity.
> Given the hypothesis of a superintelligence, we know that ve has the
> *capability* to know that Eliezer Yudkowsky, Ben Goertzel, and most humans
> on the planet think that turning the Universe into regular polygons, or
> static copies (3D paintings, really) of a few individuals, is stupid and
> not what was intended. Ve will have the capability to model, in detail,
> the sinking sensation in the stomach of the programmers, that would occur
> if we saw a scenario like this developing. The intelligence to see this
> fact can be taken as assumed.

The above assumes the AI continues to believe that human beings are
sound judges of what is and is not reasonable. That it can model our
reactions to its decision in no way says the decision is wrong. Now, I
would expect the AI to be smart enough in its own right to see the above
types of productions as nonsensically boring and pointless in the
> But that's only one third of the problem. First, ve has to *want* to know
> whether that sinking sensation will result. Second, ve has to model it
> accurately. Third, ve has to change vis behavior based on that model.
> I made the mistake I did because I saw intelligence as the only key
> variable. Perhaps this has to do with - I wince to say it -
> anthropomorphism, even normalomorphism. Every human possesses the desire
> to know whether the future will cause that stomach-sinking sensation, and
> every human possesses the desire to do something about it; what varies
> between us is mostly intelligence.

I don't see this as being true. At least I do not see that whether
other people will strongly dislike (have that stomach sinking feeling)
a particular decision is so strong a factor in making decisions for
human beings. Most of the atrocities of history would not have happened
if this were so. Also, different models of what is real and important
lead different people to have that stomach-sinking feeling about quite
different eventualities. Humans actually often weigh different things
that produce such feelings and pick the one that produces the least
distress given their value system. That value system may be entirely
cock-eyed of course.

With an AI the big question I have is where its value system comes from
and whether its value system grows proportionately to its power. How
will the AI, especially an SI, bump up against the world and rethink its
values and behavior?

> The upshot is that I now no longer believe - or rather, am no longer sure;
> it amounts to the same thing - that the ability to see "turning the
> Universe into regular polygons" as "sterile", and "sterile" as
> "undesirable", is strictly a property of pure intelligence. It may also
> have a long evolutionary background in the hundred little goals that dance
> in our brains.

I think a reasonably powerful intelligence would be sufficient to see
this as boring and destructive of many interesting aspects of said

A goal perhaps worth installing is to not harm any other intelligence
unless it is itself wreaking harm on you or those you are charged with
protecting and all strategies short of self-defensive harm are
ineffectual. Even this would take a lot of caveats but at least it is
not something so inane as loving mommy and daddy.
Why exactly would the AI see producing lots of copies of "mommy and
daddy" as constituting "love" in the first place?

> WM-4 has external reference semantics. Ve can have an "Unknown" in the
> content of the goal system. Ve can conceive of the idea that ve possesses
> an "incorrect" goal. Therefore, ve can conceive of the desirability of
> checking to make sure that a goal is correct. Ve can build up heuristics
> about when to check if a goal is correct. Ve can accept corrections to
> the goal system and not argue about it. Ve can even talk to the human
> programmers to help them understand and correct the goal system, all
> supported by the Unknown factors in the system. This is step one.

X would cause irrevocable harm to intelligences? Check before
proceeding. X would interfere in another being's basic structure and
mentation without express permission? Check before proceeding. Cannot
check? Proceed only after carefully weighing all alternatives for least
harm, greatest benefit. Analysis inconclusive? Do not take the action.
... <snippage> ...

> No! Sorry! But we do need to introduce, eventually, the concept that
> Goertzel's ideas are valid because Goertzel thinks using valid rules; that
> validity is not a simple de-facto result of the presence of some thought
> in Goertzel's mind. This is how we avoid Scenario 2; the AI can't
> wirehead Goertzel because the new Goertzel is an "invalid" wirehead whose
> satisfaction does not derive from the rules followed by the original
> Goertzel, which rules are the ultimate source of validity. Ultimately,
> this should enable the AI to become independent of Goertzel - not,
> perhaps, causally independent of humanity and the history behind our moral
> philosophies, but still independent of any one human. This is how the
> simple little reflex of rewriting the system on someone's else's command
> grows into a self-contained will.

We would be in big trouble if we make any mere human the authority over
relatively unlimited SI capabilities. The process of evolving through
obeying one (or a group) human will can much too easily go wrong or be
abused by the humans on purpose or inadvertently. I think a possibly
better answer is to build as many ethical reasoning factoids,rules and
philsophy as possible into the AI and then run countless simulated
decision scenarios. It would only be given general ability to take
widespread action after satisfactorily passing these scenarios and
refining its ethical system.

- samantha

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT