Re: Friendliness and blank-slate goal bootstrap

From: Nick Hay (
Date: Sat Oct 04 2003 - 16:54:19 MDT

Metaqualia wrote:
> This is not a continuation of the previous thread, but what about internal
> conflicts in human morality?
> Is a normalized "popular morality" the best morality we can teach to the
> AI?

"popular morality"? The morality we do teach the FAI is less important than
the metamorality (ie. the human universal adaptations you and I use to argue,
develop and choose moralities) we transfer. The FAI can go over any morality
we did transfer and see if it were really right and fair. Or it could start
again from stratch. Or both.

However the morality we do transfer, as an interim approximation, won't be
some popular kind of morality, or some sum of all present-day human
moralities, but our best guess at what is the really right thing to do.
Personally, I'd go with thinking along the lines of "help sentients, and
others that can be helped, with respect to their volitions -- maximise
self-determination, minimised unexpected regret". Focusing on the aspect of
helpfulness and implementing others' volitions.

You could go with "reduce undesirable qualia, increase desirable ones" if you

> If I could choose (not sure that I have the option to, but for sake of
> discussion) I would prefer the AI deriving its own moral rules, finding out
> what is in the best interest of everyone (not just humans but animals as
> well). This is why I was thinking, is there no way to bootstrap some kind
> of universal, all-encompassing moral system? "minimize pain qualia in all
> sentient beings" is the best moral standard I have come up with; it is
> observer independent, and any species with the ability to suffer (all
> evolved beings) should be able to come up with it in time. Who can
> subscribe to this?

There appears to be way to bootstrap this, and that's part of what CFAI is
about (although it may not look like it at first and CFAI focuses more on the
final state rather than the development process). Part of the idea is to
create an AI with the ability to reason and develop moralities at least as
good as any human or group there of (eg. governments, civilisations, etc).
Then, as the AI increases its intelligence past the human level, it can
develop its morality along with it -- it's not forced to be some chimera of
human-level morality stuck into a superintelligence. The departure from the
blank-slate goal system is the realisation that it takes a lot of work to
make a truly fair AI, to make an AI as independent of its programmers as
possible, that it's not best accomplished by leaving out as much of
everything as possible, but by building in the skills *we* use to identify
and implement fairness (and to decide what things we should leave out/include
in the AI). Our moral adaptations (and more!).

I think the universal aspect of your morality is desirable. Just because the
FAI has a human-like moral system (ie. we give it our own moral hardware to
start with, and some interim morality for it to think about) doesn't mean
it's biased towards humans. You have a human moral system, with all the
hardware flaws, with a pretty poor level of self-awareness and
self-modification ability (compared to a mature seed AI), and yet you can
understand that moralities should be as universal and fair as possible. You
understand that maybe other people are important, maybe animals are
important. A FAI is designed to be able to do likewise.

Instead of thinking about what kind of morality an AI should start with have,
and then transferring it over, why not jump back a step? Transfer your
ability to think "what kind of morality should an AI start with?" so the AI
itself can make sure you got it right? It seems like you're forced into
giving very simple inadequate verbal phrases (hiding huge amounts of
conceputal complexity) to describe morality. You're losing too much in the

> By saying this I am in no way criticizing Elizier's work, and I think what
> he proposes is a very practical way to get a friendly AI up and running
> (and incidentally would sound appealing to most people); the only thing,
> human morals kind of suck, they are full of contradictions, we can't agree
> on anything of importance, moral rules commonly held create a lot of
> suffering, and so forth.

Remember, friendliness isn't Friendliness. The former would involve something
like making an AI friend, the latter is nothing like it. Where he says
"Friendliness should be the supergoal" it means something more like "Whatever
is really right should be the supergoal". Friendliness is an external
reference (ie. referencing something outside the AI so it can realise what
the morality it has now is an interim approximation) to whatever is right,
with things like "help people have what they really want" or "reduce
undesirable qualia" being a tentative hypotheses about it.

By "human morality" I mean a morality a human could have, not the set of all
moral conclusions people actually agree upon. Humans do have a tendency to
disagree about things and generally be mistaken.

> I think it is very possible that a slightly better than human AI would
> immediately see all these fallacies in human morals, and try to develop a
> universal objective moral system on its own.

This is exactly the situation we're planning for. It turns out this isn't a
simple ability, obvious to even the simplest kinds of minds, but a complex
ability we have to analyse and develop moralities.

> I imagine a programmer training a child AI
> AI: give me an example of friendliness
> P: avoiding human death
> AI: I suggest the following optimization of resources: newly wed couples
> need not produce a baby but will adopt one sick orphan from an
> underdeveloped country. Their need for cherishing an infant will be
> satisfied and at the same time a human life will be saved every time the
> optimization is applied.
> P: the optimization you proposed would not work because humans want to have
> their own child
> AI: is the distress of not having this wish granted more important than the
> survival of the orphan?
> P: no, but humans tend not to give up a little bit of pleasure in exchange
> for another person's whole lot of pleasure. In particular they will not
> make a considerable sacrifice in order to save an unknown person's life.
> not usually

[Conversation redirection:]

AI: So perhaps I should focus more on the moralities people think they or wish
they had, the principles they proclaim (fairness, equality), rather than the
actual actions they take?

P: Yes.

(of course these conversations are far too human, they would be nothing like

> Let's differentiate: pain is a quale to me. If you talk about "awareness of
> body damage", this is a different thing. A machine can be aware of damage
> to its physical substrate. It can model other beings having a substrate and
> it receiving damage, and it can model these beings perceiving the damage
> being done. But I see no real logical reason why an AI, or even I for that
> matter, should perceive doing damage to other beings as morally wrong
> UNLESS their body damage was not a simple physical phenomenon but gave rise
> to this evil-by-definition pain quale.

Or, they had a complex moral system which negatively valued pain. Then the
system could argue about how pain is bad isn't nice "other sentients very
much dislike experiencing pain. well, most of them", and could take actions
to reduce it. This is indepedent of it "really experiencing" pain, or even
reacting to pain in the way humans do (when you're in a lot of pain your mind
is kind of crippled -- I don't think this is either a necessary or desirable
property of minds).

One way this could work is by helpfulness. If you were an AI looking on this
pain-experiencing-sentient, you can ask "what does this sentient want? does
it enjoy the state it's in?". To a first approximation, you can notice each
time a sentient is in pain, it goes to great measures to remove the pain.
You, in your desire to be helpful, decide to help it remove the pain, and
make sure you yourself never induce that kind of thing. Now there are loads
of gaps in that, but it's a step towards human helpfulness.

> That still does not tell you what the pain feels like from the inside. This
> is an additional piece of information, a very big piece. Without this
> piece, your pain is just a data structure, I can do whatever I want with
> your physical body because it is just like a videogame character. But since
> I have experienced broken nails, and I know a bullet in your head must feel
> like a broken nail * 100, I don't shoot you. Can we agree on this point?

You seem to be suggesting the only way a mind can understand another's pain
(this is an arbitary mind, not a human) is by empathy ie. "because I don't
like it, and take actions to reduce my pain, I should take actions to reduce
others' pain" (note this is a big leap, and it's non-trivial to have an AI
see this as a valid moral argument). I suspect a mind with a powerful empathy
ability could use any source of undesirability (eg. having your goals
frustrated) as a source of empathy for "that's really not nice".

Even here, I don't think it's necessary. Pain is not just a data structure,
because (to over simplify) pain states are negatively valued by the goal
system. When a pain state is noticed, you can get thoughts like "how can I
reduce this pain?", you can actions taken to remove that pain.

But how can we teach the AI that pain *should* be "negatively valued" (in the
right way!) in the first place? To this I have no good answer. I've overrun
my knowledge of Friendliness development. But I don't see hurting a mind as
necessary to explain why hurt is bad, and why hurt is something to work
towards removing.

> Did the above clarify?

Sort of :)

- Nick

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT