Re: Friendliness and blank-slate goal bootstrap

From: Nick Hay (
Date: Fri Oct 03 2003 - 16:09:57 MDT

Metaqualia wrote:
> Hello everyone.


> My first posting will be a comment on Mr Yudkowsky's meaning of life FAQ
> (

The meaning of life FAQ is now almostly entirely out of date. With regards to
making nice, meaningful AI it's completely out of date. As such the
blank-slate goal system is no longer supported -- it's unlikely to work (at
all, let alone optimally), even if objective morality existed. The new
approach is introduced here:

Creating Friendly AI (CFAI) is the book you want to read here, if you're
really interested (it's also quite out of date, but less so than the meaning
of life FAQ). I highly recommended reading the above material if you're
introduced in AI morality.

To summarise the new approach, we try to transfer the skills us humans use to
understand, argue about, and have (altruisitic) moralities to the AI. We
treat the AI as a mind, not as a tool we have to manipulate into working. We
don't try to force a nice solution from a vacuum (eg. blank-slate), but
transfer as much relevant information as possible.

One immediate benefit is the AI can handle "non-objective" morality in a much
nicer way (personally, even if morality isn't "objective" in some absolute
sense, I'd prefer not to be overwritten by a blank-slate AI). It also covers
cases where finding objective morality requires all the moral reasoning tools
humans have -- although it's objective, not all minds can find it. However
"objective" vs. "non-objective" morality isn't the right way to look at it.
More on this in CFAI.

> > 2.5.1: Can an AI, starting from a blank-slate goal system, reason to any
> nonzero goals?
> To sum this up,
> - if there is no meaning of life, then whatever we do, it doesn't matter
> - if there is a meaning of life, then we had better stay alive and look for
> it
> - so knowledge is an interim supergoal
> however,
> If knowledge is the interim supergoal, and the AI thinks it is the most
> knowledgeable system in the solar system (or that with the greatest
> capacity to acquire further knowledge), then any human attempt to divert it
> from what it is doing would be seen as an obstacle to knowing (and thus
> realizing) the meaning of life. So, any means would be justified in order
> to remove the obstacle, which could be a programmer trying to shut down the
> machine or internet users taking up processing power.
> [And, if it was the most knowledgeable system in the solar system (or that
> with the greatest capacity to acquire further knowledge), why would we
> object to being wiped out by it (assuming we shared the machine's goal and
> we were not motivated by selfish survival instincts)?]
> So, a blank-slate self-bootstrapping goal system would necessarily be
> unfriendly? (!)

Yes, it seems likely a blank-slate system would be unfriendly. Not in the
human sense of evil, but in the bacterial sense of "the universe is my
oyster, time to eat". You'd expect the mind to expand into local matter, to
better further it's search of meaning. This includes (at least) us.

> I believe that in order to appreciate the meaning of "don't do harm" in the
> sense of "do not cause painful sensations", the AI would need to experience
> the qualia for pain and suffering itself,

This doesn't seem necessary, not if you have a more complex framework for
Friendliness. Of course "don't do harm" is a lot more complex and interesting
that "do not cause painful sensations" (I don't think a blissful future as a
wirehead is a very meaningful existence).

> Possibly, this could be incorporated into its baby cognitive framework as:
> 1. humans will help you in your quest for knowledge, or so they claim.
> 2. humans have what they call qualia, they do not know what they are but
> they know that they should arise in a sufficiently complex system, or so
> they claim.
> 3. qualia will contribute to your quest of knowledge because they are a
> phenomenon seemingly operating outside of the portion of the physical world
> that humans have discovered (and also will provide information needed to
> properly weigh the suffering and/or disappearance of other sentient
> systems) 4. since humans claim to have qualia and you don't, humans still
> have information that you need, or so it would seem.
> 5. therefore, _some_ humans must be around until you have good indications
> you are also experiencing qualia >> therefore, don't kill all humans just
> yet
> 6. until that time comes, you should probably trust humans on matters
> concerning qualia, such as, "do not cause harm because it will produce
> negative qualia, which seen from a qualia-enhanced perception system -
> which you will become - are evil" >> therefore, do not kill any human

Whatever qualia are, they are something emergent from a physical system,
something that evolved from a single cell. To understand this an AI wouldn't
need to keep humans around, it could scan our state into it's mind for closer
examination. Even if it did need people around, one should suffice, and I
very much doubt they'd have a very meaningful (or happy) existence. I suspect
we can do better than this, and I think we should try.

I don't think we should try to create an AI that either implicitly (eg. add
extra conditions to "keep humans in the loop") or accidently (eg. the AI's
search for understanding qualia is accidently meaningful for it's human
information sources too) does what we think of as meaningful. How about
creating a mind that explicitly wants to make our future meaningful and good?
Why not give the AI the capability to reason about morality as we can, about
good and bad, better and worse, rather than some minimal bootstrapping

- Nick

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT