RE: Friendliness and blank-slate goal bootstrap

From: Durant Schoon (
Date: Fri Oct 03 2003 - 12:08:31 MDT

(preface: so, after 3 months working 7 days a week, ~75 hours/week,
I'm done with my FX work for Matrix Revolutions!)

> -----Original Message-----
> From: Metaqualia []
> Sent: Friday, October 03, 2003 2:31 AM

> Hello everyone.


> My first posting will be a comment on Mr Yudkowsky's meaning
> of life FAQ
> (
> > 2.5.1: Can an AI, starting from a blank-slate goal system,
> reason to any
> nonzero goals?
> To sum this up,
> - if there is no meaning of life, then whatever we do, it
> doesn't matter
> - if there is a meaning of life, then we had better stay
> alive and look for
> it
> - so knowledge is an interim supergoal
> however,
> If knowledge is the interim supergoal, and the AI thinks it
> is the most
> knowledgeable system in the solar system (or that with the
> greatest capacity
> to acquire further knowledge), then any human attempt to
> divert it from what
> it is doing would be seen as an obstacle to knowing (and thus
> realizing) the
> meaning of life. So, any means would be justified in order to
> remove the
> obstacle, which could be a programmer trying to shut down the
> machine or
> internet users taking up processing power.
> [And, if it was the most knowledgeable system in the solar
> system (or that
> with the greatest capacity to acquire further knowledge), why would we
> object to being wiped out by it (assuming we shared the
> machine's goal and
> we were not motivated by selfish survival instincts)?]
> So, a blank-slate self-bootstrapping goal system would necessarily be
> unfriendly? (!)

Not necessarily. It is still possible for a self-bootstraping goal
system to become Friendly. If you consider the history of life on
Earth as such a self-bootstrapping system and each of us sentients
as a leaf node in this forward branching process and that one
individual or group of us can produce Friendly AI, then that
possibility is still open.

When thinking about such possibilities, it is also useful to
consider the vast number of afriendly systems, ie. those that
are neither friendly nor unfriendly. A blank-slate self-
bootstrapping goal system, might tend to one of those as well.
Or if you don't think so, maybe you can offer some reasons why.

> It would be really nice if we found that the most basic
> morality system (do
> not make beings feel pain, make them feel pleasure when you
> can) can also be
> bootstrapped.

Basic morality would be nice. Knowing how to boostrap a satisfactory
transhuman morality would be nicer :)

> I believe that in order to appreciate the meaning of "don't
> do harm" in the
> sense of "do not cause painful sensations", the AI would need
> to experience
> the qualia for pain and suffering itself, so that it could
> deduce that this
> unexplained phenomenon is in the most direct sense evil and
> must be avoided
> in any observer.... but we do not know how qualia work or
> what they are, let
> alone how to program a machine to 'have them'. So, is there a
> way to tell
> the machine that it has to "wait until it has qualia" or "try
> to achieve
> qualia" before it makes moral decision on the fate of other beings?

I don't agree. I do not personally know the pain of being thrown
in a wheat thresher. But I can tell you with certainty that I want
to avoid it.

If an AI knows what we want to avoid, don't you think the AI might be
able to behave exactly the same as if "the AI has it's own qualia and
is more directly sympathetic to humanity"?

Besides, the AIs I like to conceive of experience all the qualia we
do, plus qualia we cannot yet imagine ;-)

> Possibly, this could be incorporated into its baby cognitive
> framework as:
> 4. since humans claim to have qualia and you don't, humans still have
> information that you need, or so it would seem.
> 5. therefore, _some_ humans must be around until you have
> good indications
> you are also experiencing qualia >> therefore, don't kill all
> humans just
> yet

That seems like a thin shield of protection. Hopefully we can come up
with something safer than that.

> [bonus. if you are already superintelligent and still are not
> experiencing
> qualia it would be a good idea to get some humans to
> transcend and see if a
> qualia-augmented superintelligence is better at finding out
> what qualia are
> than you qualia-less superintelligence are.]

Ok, one femto second passes. Done. What happens next?

> PS: The termination of the AI would still be an ultimate
> evil, and death of
> all humans will be preferred in an extreme situation in which
> it's either
> one way or the other, since the AI is better off looking for
> the meaning of
> life without humans than dead

Ah, but this is where you probably want to read CFAI to get a
good sense of how a Friendly AI might want to arrange it's
goals. The Ultimate Evil, and any class of evil hopefully,
would be avoided by setting Friendliness as the supergoal.

> PPPS: about harm, and suffering:
> I think that killing and doing harm to a being are two very different
> things. Killing swiftly without the victim being able to
> notice and process
> what is going on is a clean termination, it does not comport negative
> feedback loops, or pain qualia. Making the being suffer, on
> the other hand,
> creates a subjective sensation of agony. While I am quite
> confident that an
> intelligence experiencing suffering would label suffering as
> negative (many
> humans do, and if it weren't for more powerful selfish
> instincts most humans
> would probably avoid suffering to others, given the
> opportunity), I am not
> confident that it would label clean termination as negative. On the
> contrary, beings that are programmed to suffer such as human
> beings would
> probably be likely targets for moral massacres (kill them, so
> they will not
> suffer) [possible happier scenario: make their brain unable to process
> pain... but what if only the first option was currently feasible?]

I try to avoid focusing on the issue of qualia with regard to Super
Intelligence, other than for pure recreational thinking. Qualia are only
interesting to me in the sense that they are part of my own personal
goal system. Most likely, once I am consciously in charge of all that
(if Friendly AI succeeds, I will be), I will feel great! I'm probably
going to want to experience more-wonderful-things(tm), but eventually,
the real question then turns into: What do I want to become and what do
I have to do to get there?

(The answer to this partially involves what others want to become as

Are qualia important for designing a Friendly AI? If so, then they are
imporant. Otherwise, I'd rather think about something else.


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT