Re: Friendliness and blank-slate goal bootstrap

From: Charles Hixson (
Date: Fri Jan 09 2004 - 19:55:57 MST

Nick Hay wrote:

>Metaqualia wrote:
>You could go with "reduce undesirable qualia, increase desirable ones" if you
Be very careful here! The easiest way to reduce undesirable qualia is to
kill off everyone who has the potential for experiencing them.

>Instead of thinking about what kind of morality an AI should start with have,
>and then transferring it over, why not jump back a step? Transfer your
>ability to think "what kind of morality should an AI start with?" so the AI
>itself can make sure you got it right? It seems like you're forced into
>giving very simple inadequate verbal phrases (hiding huge amounts of
>conceputal complexity) to describe morality. You're losing too much in the
It seems to me that a person's method for determining the desireable
morality is based partially on instincts, partially on training, and
partially on self-interest. I've seen remarkable transformations in what
was considered moral occur when the self-interest of the decider has
shifted. Similarly, the morality that is desired seems to be very
subject to what other people insist is the right thing to do. War
propaganda is a very current example here. I include this modification
under the category of instinctual. People ARE herd an extent.

So, when you are saying that the AI should use the process that you do
... are you sure about that? Just how heavily do you want the AI to
weigh it's self interest? Do you want it to be able to justify
intentionally killing off the entire human race? Or even not considering
that danger very important when calculating risks? (Note that people
frequently don't. I consider this one of our major failings as a species.)

> ...
>Remember, friendliness isn't Friendliness. The former would involve something
>like making an AI friend, the latter is nothing like it. Where he says
>"Friendliness should be the supergoal" it means something more like "Whatever
>is really right should be the supergoal". Friendliness is an external
This is assuming that "right" has some absolute meaning, but this is
only true in the context of a certain set of axioms (call them
instincts). And also from a particular point of view. Friendliness can
be friendliness, if that's what the instincts say is right. Remember
that your basic goals can't be choosen, but they must be describable in
very simple terms. And nearly all formulations are subject to failure
modes. E.g., Eliza would be much more willing to talk to it than any
person would, so having "someone to talk to" would be a bad choice of
supergoal. If you make it friendly to those with some particular DNA
characteristics (say something that specifies a particular protein
unique to humans), then it won't necessarily be friendly to uploads.
Etc. This is one of the more difficult things to get right. And getting
it right is crucial. Even better would be being able to *know* that you
got it right.


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:43 MDT