Re: Friendliness and blank-slate goal bootstrap

From: Nick Hay (
Date: Sat Jan 10 2004 - 16:49:38 MST

On 10/01/04 15:55:57, Charles Hixson wrote:
> Nick Hay wrote:
> > Metaqualia wrote:
> > > ...
> > ...
> > You could go with "reduce undesirable qualia, increase desirable
> > ones" if you liked.
> Be very careful here! The easiest way to reduce undesirable qualia is
> to kill off everyone who has the potential for experiencing them.

This rule does have lots of problems. What may not be clear from the
context is this is my summary of Metaqualia's view. I don't think it
works, nor is it sufficent. I had:

Nick Hay wrote:
> Personally, [my best guess at what's really right ie. the best
> morality for an altruist is] along the lines of "help sentients, and
> others that can be helped, with respect to their volitions --
> maximise self-determination, minimised unexpected regret". Focusing
> on the aspect of helpfulness and implementing others' volitions.

This, of course, hides a huge amount of complexity -- even if I were
trying to transfer this to a human, rather than an AI.

>> Instead of thinking about what kind of morality an AI should start
>> with have, and then transferring it over, why not jump back a step?
>> Transfer your ability to think "what kind of morality should an AI
>> start with?" so the AI itself can make sure you got it right? It
>> seems like you're forced into giving very simple inadequate verbal
>> phrases (hiding huge amounts of conceputal complexity) to describe
>> morality. You're losing too much in the compression.
> It seems to me that a person's method for determining the desireable
> morality is based partially on instincts, partially on training, and
> partially on self-interest. I've seen remarkable transformations in
> what was considered moral occur when the self-interest of the
> decider has shifted. Similarly, the morality that is desired seems
> to be very subject to what other people insist is the right thing to
> do. War propaganda is a very current example here. I include this
> modification under the category of instinctual. People ARE herd
> an extent.

Sure, and since we're human this is our problem too. The human "moral
reasoning methods" leave much to be desired. That is, typically when
humans examine their moral reasoning methods with their moral reasoning
methods, they find much to be desired: the methods are not self-
consistent, nor stable under reflection. We find some factors influence
our moralities when they shouldn't, or in ways they shouldn't. We find
other factors we think should, or thought did, influence things which
don't. This means you shouldn't "hard code" these methods as some fixed
ideal. This doesn't mean you can ignore them and start with a blank
slate, but it does mean you have to be careful and understand *exactly*
what you're doing.

Of course these instincts are, in some ways, probably like our
instincts for, say, vision: exceedingly complex, with a fair deal of
nontrivial structure. This is easy to see with the visual system since
we can "simply" trace back from the eyes through various parts of the
brain. The complexities underlying our moral reasoning process, and our
sense of desirabilty, are less scientifically accessible (however, I
haven't examined the literature, so I don't know what we know).

> So, when you are saying that the AI should use the process that you
> do ... are you sure about that? Just how heavily do you want the AI
> to weigh it's self interest? Do you want it to be able to justify
> intentionally killing off the entire human race? Or even not
> considering that danger very important when calculating risks? (Note
> that people frequently don't. I consider this one of our major
> failings as a species.)

No, I did not mean to imply a pattern-copy of the human moral reasoning
process. A closer approximation would be how humans *think* they
reason, or how they'd like to reason (if they had the ability to
choose), this would fix the errors you have suggested.

I do agree that ignoring existential risks, or not paying them enough
of the right attention, is a pretty major flaw. Clearly, however, it's
not impossible to take them into consideration ie. some do.

So I'm not saying that the AI should use the exact reasoning process we
use, but that we *do* have to create some kind of reasoning process
inspired by ours. We can't leave a blank slate, or simply stick some
moral conclusions eg. "don't kill children" onto it. I'm not saying
"this is the right way to create a Friendly AI" (I don't know how to
create FAI), except in broad and uncertain terms, but more "this is not
the right way to create a Friendly AI, you are missing, at least,
<these> important things".

However, even an pattern-copy of the human moral reasoning process
(pattern-copying a human is conceptually simple as a goal, but in
practice pretty difficult) is better than pattern-copying the output of
one (ie. a humanly devised morality). The former contains, most
importantly, the structure *behind* the conclusions, the ability to
rederive conclusions upon gaining new skills, understanding, or upon
noticing mistakes. Unrealistic (for a FAI), but illustrative, examples:

eg. "Wow, this whole philosophy was inspired by selfishness; I didn't
know that, nor did I want it. Ok, what does it look like *without*
eg. "Wow, this plan could destroy the human species! Why didn't I
notice that before? Hmmm, seems like my reasoning process doesn't take
into account extinction with the appropriate seriousness. Let's see if
I can fix this..."

None of these examples would be possible if we only transferred our
conclusions ontop a blank slate. If we transferred our conclusions, and
the reasons for those conclusions, and the mechanisms we used to
generate those reasons and conclusions, in the right way, this kind of
reasoning (maybe) becomes possible.

>> Remember, friendliness isn't Friendliness. The former would involve
>> something like making an AI friend, the latter is nothing like it.
>> Where he says "Friendliness should be the supergoal" it means
>> something more like "Whatever is really right should be the
>> supergoal". Friendliness is an external
> This is assuming that "right" has some absolute meaning, but this is
> only true in the context of a certain set of axioms (call them
> instincts). And also from a particular point of view. Friendliness
> can be friendliness, if that's what the instincts say is right.

Perhaps, but this is why I said "something more like" (I guess, can't
actually remember the specific thoughts). The supergoal (in so much as
we use a goal architecture in this way) shouldn't be some fixed moral
conclusion, such as "be like a human friend". It should be more open-
ended than that, to allow the FAI to grow.

> Remember that your basic goals can't be choosen, but they must be
> describable in very simple terms. And nearly all formulations are
> subject to failure modes. E.g., Eliza would be much more willing to
> talk to it than any person would, so having "someone to talk to"
> would be a bad choice of supergoal. If you make it friendly to those
> with some particular DNA characteristics (say something that
> specifies a particular protein unique to humans), then it won't
> necessarily be friendly to uploads. Etc. This is one of the more
> difficult things to get right. And getting it right is crucial.
> Even better would be being able to *know* that you got it right.

Yes, this is one problem with giving (or, trying to give) an AI some
fixed moral conclusion you thought up: you might, later on, think up
reasons why that morality was wrong. If you didn't transfer the
cognitive mechanisms you used to generate, and are now invoking to
correct, the morality, the AI will say something like "yes, but if I
change *that* then I won't be as friendly to those with human DNA!" ie.
the supergoal is fixed. You can't change it through the goal system,
you have to fiddle with the AI's internals *against* its goal system
(perhaps possible for an infant-level AI, but not for a smarter one).

It seems like you're thinking in terms of "picking the best morality
for an AI" (an AI assumed to have the structure humans use to implement
moralities) rather than, say, "engineering an AI that can understand
morality, along with the arguments and philosophies we use to decide
which actions we should or shouldn't take".

- Nick Hay

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:45 MDT