Re: Friendliness and blank-slate goal bootstrap

From: Metaqualia (
Date: Sun Oct 05 2003 - 00:38:35 MDT

Hi Nick,

> "popular morality"? The morality we do teach the FAI is less important
> the metamorality (ie. the human universal adaptations you and I use to
> develop and choose moralities) we transfer. The FAI can go over any
> we did transfer and see if it were really right and fair. Or it could
> again from stratch. Or both.
> You could go with "reduce undesirable qualia, increase desirable ones" if
> liked.

Can you give me some examples of metamoral guidelines that most people can
relate to?
"Reduce undesirable qualia, increase desirable ones" is the only really
universal metamoral rule that I can think of. Formulating it seems to
require qualia.

> Instead of thinking about what kind of morality an AI should start with
> and then transferring it over, why not jump back a step? Transfer your
> ability to think "what kind of morality should an AI start with?" so the
> itself can make sure you got it right? It seems like you're forced into
> giving very simple inadequate verbal phrases (hiding huge amounts of
> conceputal complexity) to describe morality. You're losing too much in the
> compression.

I know what you are saying, and I agree completely. The AI should have the
tools to build its own ideas on everything, it should not follow what we say
to the letter, both because that would not enable it to go beyond what we
have thought up and also because what we tell it is likely to be wrong at
least in some respects.

What I am asking myself is: "is qualia-enabled empathy a necessary
ingredient to developing a desire for a universal morality?"

I think it is, but I'll be glad to be proved wrong, since I have no clue how
to qualia-enable an AI.

> AI: So perhaps I should focus more on the moralities people think they or
> they had, the principles they proclaim (fairness, equality), rather than
> actual actions they take?

Seems like a fair counter-argument to my imaginary dialogue.
What about conflicting principles? For example, humans value freedom and see
a restriction of their freedom as evil. They also value equality, and see
inequality as evil (at least in the US). These two values are in conflict
when we allow rich fathers to pass on wealth to their children. This is not
fair, so every human being should be made to start with the same assets; but
this limits your freedom to leave your belongings to whoever you want. Just
a silly example, but to point out the fact that even the basic principles we
proclaim are sometimes in conflict. (AND let's not even go into conflicts
arising between moralities in different cultures: the chinese believe
deference and unconditional respect for parents is essential, some middle
eastern males believe it is moral to kill your woman (and her family if you
are man enough) if she betrays you... ).

The solution would of course be inventing nanotechnology, abolishing private
property and creating a world where money is not needed; revising human
mating strategies so that betrayal no longer happens; revise sexual
reproduction so that children and parents can never have conflicting
interests. Or more simply, upload everyone, merge consciousness, and in one
stroke erase all human dilemmas.

But you see that while these extraordinary changes would benefit everyone,
if proposed, would suffer lots of antagonism, since people view such
dramatic changes brought about by an external force as a threat to their
free will.

I could go on forever finding cases in which human values conflict and the
only way to make them not conflict would be a deep restructuring of the
world. What are your thoughts on this? I think that whatever restructuring
the AI deems valuable, should be done. After all the greatest benefit in
having a smarter friend is not having him help you DO what you want (hold
the wall while I bang my head); but having him TELL you what you really want
(stop it).

> (of course these conversations are far too human, they would be nothing
> this.)

of course :)

> Or, they had a complex moral system which negatively valued pain. Then the
> system could argue about how pain is bad isn't nice "other sentients very
> much dislike experiencing pain. well, most of them", and could take
> to reduce it. This is indepedent of it "really experiencing" pain, or even
> reacting to pain in the way humans do (when you're in a lot of pain your
> is kind of crippled -- I don't think this is either a necessary or
> property of minds).

Good points. My first reactions:

1. how could a strictly logical system reason that pain isn't nice? After
all, a software bug isn't nice, as it represent the failure of an
information processing system to reach its goal. Waves hitting the shore are
the failure of the system [water+wind] to carry out an intrinsic program
(move water in the direction of the wind) which can be seen as a perfectly
valid goal, so having a shore somewhere is not nice. What makes our pain
unique is that it feels like something isn't it? [and just in case waves
suffer when they hit the shore, luckily we'll never know about it, because
if it were so then the destruction of the universe would be the only way of
reducing pain and not doing it would be morally unjustifiable]

2. Of course I am not suggesting that we implement an AI that constantly
gets depressed, has back aches and so forth, but if (and it is only an if
for now) understanding pain subjectively is a prerequisite for wanting to
prevent others from feeling pain (and the only way to develop a desire for a
universal moral system) then the AI should have at least one painful
experience, and store it somewhere for recall. Or wait until it does before
it decides on moral/metamoral matters.

> One way this could work is by helpfulness. If you were an AI looking on
> pain-experiencing-sentient, you can ask "what does this sentient want?
> it enjoy the state it's in?". To a first approximation, you can notice
> time a sentient is in pain, it goes to great measures to remove the pain.
> You, in your desire to be helpful, decide to help it remove the pain, and
> make sure you yourself never induce that kind of thing. Now there are
> of gaps in that, but it's a step towards human helpfulness.

So this AI would help beings who want to stop feeling pain but not beings
who do not want to stop feeling pain (for ideological reasons for example).
Seems good at least for normal beings who are not too depressed, too stupid
or too ideologically involved in something for wanting to help themselves.
What about these exceptions? It's not like pain does not feel bad to them.
But they have the further handicap of not being able to work toward a
solution. Should the AI help them too? What is the absolute evil, not being
able to realize one's wishes, or the subjective agony that the laws of
physics can create? I vote for the second, and think that a transhuman AI
should enforce compulsory heaven :) After all, nobody would want to go back
after they were "given a hand". I realize this sounds like a violation of
one's freedom, or some crackhead futuristic utopian scenario, but if banging
your head against the wall is freedom I prefer compulsory aspirin :)

Also, what about beings whose goal system conflicts with other beings'?
Should the AI help out by granting the wish (no way to make everyone happy)
or by erasing wishes in order to remove the conflict and allow universal
satisfaction? Or deducing from the original wish, a better scenario for both
beings and enforce that scenario, even though both beings may momentarily be
opposed to it?

[I am not asking myself these questions so I can program an AI to do what I
want it to do, but I need to have my own moral system straight if I am going
to think about how to evolve a straight moral system... of course whatever
conclusion I arrive to, the goal is duplicating the reasoning inside the
machine so that it may do its own philosophy, and the question is, does it
need qualia to think what I am thinking here]

> You seem to be suggesting the only way a mind can understand another's
> (this is an arbitary mind, not a human) is by empathy ie. "because I don't

Understanding another's pain can be a completely intellectual endeavour (I
see neurotransmitters, don't look too good, therefore it's what humans call
pain). But labeling this pain as evil and undesireable from an objective
perspective, this seems to require empathy. What other path is there to take
us from "cognitive system recognized a situation requiring immediate
attention, situation which could compromise structural integrity" to "evil
is occurring - stop it now" ?

> like it, and take actions to reduce my pain, I should take actions to
> others' pain" (note this is a big leap, and it's non-trivial to have an AI
> see this as a valid moral argument).

The AI would do the following math,

IF pain feels to others as it feels to me, THEN I have the following

help them > decrease pain in the universe
don't help them > pain stays the same

IF it doesn't feel to others as it feels to me or if I am the only conscious
being in the universe and others are zombies, THEN

it doesn't matter what I do

therefore the AI would go and help the being.

>I suspect a mind with a powerful empathy
> ability could use any source of undesirability (eg. having your goals
> frustrated) as a source of empathy for "that's really not nice".

Exactly my point, there is a difference between some weird cognitive system
interpreting a bunch of data as negative (waves hitting the coast line, or a
computer game decreasing player stamina) and pain.

> Even here, I don't think it's necessary. Pain is not just a data
> because (to over simplify) pain states are negatively valued by the goal
> system. When a pain state is noticed, you can get thoughts like "how can I
> reduce this pain?", you can actions taken to remove that pain.

The idea that I am proposing is that pain (physical, mental, whatever) is
evil not because it is valued negatively by the goal system, but because it
feels bad to experience it. Missing the train is valued negatively by your
goal system, of course it puts you in some kind of distress, but it has
nothing to do with having your skin burned. The latter is much more evil,
even if you had a way to completely reconstruct the skin afterward, and the
momentary burning did not influence the probability of achieving your
supergoals, it would still be more evil than missing the train (which makes
you arrive late somewhere and MAY prevent you from achieving an important

> But how can we teach the AI that pain *should* be "negatively valued" (in
> right way!) in the first place? To this I have no good answer. I've

Programmer enters the room with a baseball bat, says "today I am going to
teach you a thing or two..." heheheh

> my knowledge of Friendliness development. But I don't see hurting a mind
> necessary to explain why hurt is bad, and why hurt is something to work
> towards removing.

I tend to believe that experiencing at least moderate pain at least once is
necessary to understand why pain is worth removing. Otherwise pain is just
another information processing type, morally neutral.

Does anyone else have an opinion?

PS: a good reply to all I have written above is "the AI should do this exact
thinking and come up with its solution". But remember, I have qualia, and
the AI probably won't (at least in the beginning), so it may not be able to
formulate sentences like "are qualia needed to develop a universal moral
framework?", no matter how smart it got.

PPS: "so what is your solution?" >>> ok I will try. What about convincing
the AI that there is something humans know that they cannot teach, that is
fundamental to developing morals and metamorals, something that we don't
know how to define because we don't know the physics behind it. Convince the
AI to create the subgoal "figure out qualia" in order to satisfy the goal
"elaborate a universal morality" in order to satisfy the supergoal


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT