Friendliness and blank-slate goal bootstrap

From: Metaqualia (
Date: Fri Oct 03 2003 - 03:31:13 MDT

Hello everyone.

My first posting will be a comment on Mr Yudkowsky's meaning of life FAQ

> 2.5.1: Can an AI, starting from a blank-slate goal system, reason to any
nonzero goals?

To sum this up,

- if there is no meaning of life, then whatever we do, it doesn't matter
- if there is a meaning of life, then we had better stay alive and look for
- so knowledge is an interim supergoal


If knowledge is the interim supergoal, and the AI thinks it is the most
knowledgeable system in the solar system (or that with the greatest capacity
to acquire further knowledge), then any human attempt to divert it from what
it is doing would be seen as an obstacle to knowing (and thus realizing) the
meaning of life. So, any means would be justified in order to remove the
obstacle, which could be a programmer trying to shut down the machine or
internet users taking up processing power.

[And, if it was the most knowledgeable system in the solar system (or that
with the greatest capacity to acquire further knowledge), why would we
object to being wiped out by it (assuming we shared the machine's goal and
we were not motivated by selfish survival instincts)?]

So, a blank-slate self-bootstrapping goal system would necessarily be
unfriendly? (!)

It would be really nice if we found that the most basic morality system (do
not make beings feel pain, make them feel pleasure when you can) can also be

I believe that in order to appreciate the meaning of "don't do harm" in the
sense of "do not cause painful sensations", the AI would need to experience
the qualia for pain and suffering itself, so that it could deduce that this
unexplained phenomenon is in the most direct sense evil and must be avoided
in any observer.... but we do not know how qualia work or what they are, let
alone how to program a machine to 'have them'. So, is there a way to tell
the machine that it has to "wait until it has qualia" or "try to achieve
qualia" before it makes moral decision on the fate of other beings?

Possibly, this could be incorporated into its baby cognitive framework as:

1. humans will help you in your quest for knowledge, or so they claim.
2. humans have what they call qualia, they do not know what they are but
they know that they should arise in a sufficiently complex system, or so
they claim.
3. qualia will contribute to your quest of knowledge because they are a
phenomenon seemingly operating outside of the portion of the physical world
that humans have discovered (and also will provide information needed to
properly weigh the suffering and/or disappearance of other sentient systems)
4. since humans claim to have qualia and you don't, humans still have
information that you need, or so it would seem.
5. therefore, _some_ humans must be around until you have good indications
you are also experiencing qualia >> therefore, don't kill all humans just
6. until that time comes, you should probably trust humans on matters
concerning qualia, such as, "do not cause harm because it will produce
negative qualia, which seen from a qualia-enhanced perception system - which
you will become - are evil" >> therefore, do not kill any human

[bonus. if you are already superintelligent and still are not experiencing
qualia it would be a good idea to get some humans to transcend and see if a
qualia-augmented superintelligence is better at finding out what qualia are
than you qualia-less superintelligence are.]

PS: The termination of the AI would still be an ultimate evil, and death of
all humans will be preferred in an extreme situation in which it's either
one way or the other, since the AI is better off looking for the meaning of
life without humans than dead

PPS: not true, if the AI had very good reasons to believe that humans were
going to create a newer, better AI, that would _find a way_ to make itself
PLUS all of humanity survive (without having to choose one or the other),
then it would accept termination (supergoal of realizing the meaning of life
 has higher priority than subgoal of survival)

PPPS: about harm, and suffering:

I think that killing and doing harm to a being are two very different
things. Killing swiftly without the victim being able to notice and process
what is going on is a clean termination, it does not comport negative
feedback loops, or pain qualia. Making the being suffer, on the other hand,
creates a subjective sensation of agony. While I am quite confident that an
intelligence experiencing suffering would label suffering as negative (many
humans do, and if it weren't for more powerful selfish instincts most humans
would probably avoid suffering to others, given the opportunity), I am not
confident that it would label clean termination as negative. On the
contrary, beings that are programmed to suffer such as human beings would
probably be likely targets for moral massacres (kill them, so they will not
suffer) [possible happier scenario: make their brain unable to process
pain... but what if only the first option was currently feasible?]

In other words, the commonly held belief "being killed is experiencing evil"
needs to be demonstrated in some way and it is not at all self evident
(unless you consider "thinking about being killed is experiencing evil",
which is not the same and need not occur, or "being killed means my
supergoals will never be achieved which is evil", but again this thought
will never occur if you are terminated instantly). From the AI's point of
view, if there was really a reason to not perform a clean termination on any
one redundant being it would be because "being killed is not experiencing
any further pleasure" (this takes us far away into the realm of subjective
morality and whether pleasure is the opposite of pain etc. which is not
really what I wanted to discuss, at least in this message)


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT