Re: Friendliness and blank-slate goal bootstrap

From: Nick Hay (
Date: Sun Oct 05 2003 - 23:56:32 MDT

Metaqualia wrote:
> Can you give me some examples of metamoral guidelines that most people can
> relate to?
> "Reduce undesirable qualia, increase desirable ones" is the only really
> universal metamoral rule that I can think of. Formulating it seems to
> require qualia.

Hm, that's more of a moral rule than a metamoral rule. Metamoral forces are
more like empathy, observer symmetry, simplicity. Hmm, I might be mixing this
up with simple moral principles too. Have a look at shaper/anchor semantics
in CFAI.

> > Instead of thinking about what kind of morality an AI should start with
> > have,
> > and then transferring it over, why not jump back a step? Transfer your
> > ability to think "what kind of morality should an AI start with?" so the
> > AI
> > itself can make sure you got it right? It seems like you're forced into
> > giving very simple inadequate verbal phrases (hiding huge amounts of
> > conceputal complexity) to describe morality. You're losing too much in
> > the compression.
> I know what you are saying, and I agree completely. The AI should have the
> tools to build its own ideas on everything, it should not follow what we
> say to the letter, both because that would not enable it to go beyond what
> we have thought up and also because what we tell it is likely to be wrong
> at least in some respects.
> What I am asking myself is: "is qualia-enabled empathy a necessary
> ingredient to developing a desire for a universal morality?"

It might be necessary for the finer details, I don't understand qualia, but it
seems unnecessary for simple empathy. But then I don't understand empathy
well enough to code it, so... :)

> I think it is, but I'll be glad to be proved wrong, since I have no clue
> how to qualia-enable an AI.
> > AI: So perhaps I should focus more on the moralities people think they or
> > wish
> > they had, the principles they proclaim (fairness, equality), rather than
> > the
> > actual actions they take?
> Seems like a fair counter-argument to my imaginary dialogue.
> What about conflicting principles?

Well, in this case you do what humans do - either find a way around the
contradiction (high technology is a great way to do that), balance the two
conflicting principles, or something else.

> The solution would of course be inventing nanotechnology, abolishing
> private property and creating a world where money is not needed; revising
> human mating strategies so that betrayal no longer happens; revise sexual
> reproduction so that children and parents can never have conflicting
> interests. Or more simply, upload everyone, merge consciousness, and in one
> stroke erase all human dilemmas.
> But you see that while these extraordinary changes would benefit everyone,
> if proposed, would suffer lots of antagonism, since people view such
> dramatic changes brought about by an external force as a threat to their
> free will.

Then these changes should be made less dramatic, where possible. It's
important for people to understand their world.

> I could go on forever finding cases in which human values conflict and the
> only way to make them not conflict would be a deep restructuring of the
> world. What are your thoughts on this? I think that whatever restructuring
> the AI deems valuable, should be done. After all the greatest benefit in
> having a smarter friend is not having him help you DO what you want (hold
> the wall while I bang my head); but having him TELL you what you really
> want (stop it).

Both are subcases of the AI doing what you really want, it seems. The only way
obvious is deep restructuring of the world, perhaps, but it may be possible
to do this in a way that doesn't disturb anyone. Maybe. This is, clearly, not
a problem specifically of AI, but interesting none-the-less.

> > Or, they had a complex moral system which negatively valued pain. Then
> > the system could argue about how pain is bad isn't nice "other sentients
> > very much dislike experiencing pain. well, most of them", and could take
> > actions
> > to reduce it. This is indepedent of it "really experiencing" pain, or
> > even reacting to pain in the way humans do (when you're in a lot of pain
> > your mind
> > is kind of crippled -- I don't think this is either a necessary or
> > desirable property of minds).
> Good points. My first reactions:
> 1. how could a strictly logical system reason that pain isn't nice? After
> all, a software bug isn't nice, as it represent the failure of an
> information processing system to reach its goal. Waves hitting the shore
> are the failure of the system [water+wind] to carry out an intrinsic
> program (move water in the direction of the wind) which can be seen as a
> perfectly valid goal, so having a shore somewhere is not nice. What makes
> our pain unique is that it feels like something isn't it? [and just in case
> waves suffer when they hit the shore, luckily we'll never know about it,
> because if it were so then the destruction of the universe would be the
> only way of reducing pain and not doing it would be morally unjustifiable]

An AI is not a strictly logical system. I don't think you can reason that
"pain is nice" from a strictly logical vacuum.

Most general information processing systems don't have goals except as we
anthropomorphise them. A software bug is only bad from our point of view, the
system doesn't reflectively examine itself and see a bug as detracting from
its goals. You need a particular kind of system to have goals anything like
the purpose we imagine the system to have.

Here's an example of a system that (I think) roughly has some interesting
kinds of goal-oriented behaviour. Suppose we have a reliable way to identify
the overall pain in a given universe, and we have a prediction system that
accurately models the universe (give it infinite computing power, or so).
Then, given that model of the universe, it runs a prediction about the
consequences of all possible actions it could take, runs the pain-locator on
each state, and picks the action which leads to the least pain. This system
should, given enough computational power and an accurate model of pain, lead
the universe into painfree zones. If this physical system used computations
would could identify with reasoning, and had some kind of reflectivity, and a
bunch of other things, it'd see it as 'obvious' that pain should be reduced.

(Of course this is a highly dangerous and physically unrealistic model, so
don't try this.)

> 2. Of course I am not suggesting that we implement an AI that constantly
> gets depressed, has back aches and so forth, but if (and it is only an if
> for now) understanding pain subjectively is a prerequisite for wanting to
> prevent others from feeling pain (and the only way to develop a desire for
> a universal moral system) then the AI should have at least one painful
> experience, and store it somewhere for recall. Or wait until it does before
> it decides on moral/metamoral matters.

Sure, if it's necessary. If the AI doesn't seem to understand things without
it, you might have to engineer a pain/pleasure architecture for empathy's

> So this AI would help beings who want to stop feeling pain but not beings
> who do not want to stop feeling pain (for ideological reasons for example).
> Seems good at least for normal beings who are not too depressed, too stupid
> or too ideologically involved in something for wanting to help themselves.
> What about these exceptions? It's not like pain does not feel bad to them.
> But they have the further handicap of not being able to work toward a
> solution. Should the AI help them too?

Yes, help should be given to all. It is very tricky to determine what's
helpful, and what a particular mind really wants, especially in unusual cases
like that.

> What is the absolute evil, not being
> able to realize one's wishes, or the subjective agony that the laws of
> physics can create? I vote for the second, and think that a transhuman AI
> should enforce compulsory heaven :) After all, nobody would want to go back
> after they were "given a hand". I realize this sounds like a violation of
> one's freedom, or some crackhead futuristic utopian scenario, but if
> banging your head against the wall is freedom I prefer compulsory aspirin
> :)

I tend to lean towards the former, but naturally the AI should be able to
handle either, just like humans can. If a person really does want to feel
pain, as opposed to just thinking they do (a person's volition is not
equivalent to the implementation of it in their head), then I'd let them.

> Also, what about beings whose goal system conflicts with other beings'?
> Should the AI help out by granting the wish (no way to make everyone happy)
> or by erasing wishes in order to remove the conflict and allow universal
> satisfaction? Or deducing from the original wish, a better scenario for
> both beings and enforce that scenario, even though both beings may
> momentarily be opposed to it?

Volitional conflicts are also tricky. Although giving a sentient volitional
priority over itself seems like a good heuristic that'd solve a lot of
problems we have today.

> [I am not asking myself these questions so I can program an AI to do what I
> want it to do, but I need to have my own moral system straight if I am
> going to think about how to evolve a straight moral system... of course
> whatever conclusion I arrive to, the goal is duplicating the reasoning
> inside the machine so that it may do its own philosophy, and the question
> is, does it need qualia to think what I am thinking here]

Pretty much.

> > You seem to be suggesting the only way a mind can understand another's
> pain
> > (this is an arbitary mind, not a human) is by empathy ie. "because I
> > don't
> Understanding another's pain can be a completely intellectual endeavour (I
> see neurotransmitters, don't look too good, therefore it's what humans call
> pain). But labeling this pain as evil and undesireable from an objective
> perspective, this seems to require empathy. What other path is there to
> take us from "cognitive system recognized a situation requiring immediate
> attention, situation which could compromise structural integrity" to "evil
> is occurring - stop it now" ?

I don't think its necessary for an AI to see damage to *itself* as
intrinsically evil. It's undesirable in so much as it gets in the way of the
AI achieving good.

One way for an AI to see a state as undesirable, is given in my example of a
pain reducing system -- we build the decision system such that it choose
actions (including internal actions like "what should I think next") to
minimise pain. Now this is a simplistic model, it lacks all the complexity of
human morality. To transfer a more accurate understanding of the
undesirability of pain-states (in many cases) you do more complex things :)

> > like it, and take actions to reduce my pain, I should take actions to
> reduce
> > others' pain" (note this is a big leap, and it's non-trivial to have an
> > AI see this as a valid moral argument).
> The AI would do the following math,
> IF pain feels to others as it feels to me, THEN I have the following
> options.
> help them > decrease pain in the universe
> don't help them > pain stays the same

You've implicitly invoked empathy and some degree of altruism here. Without it
you have something more like:

help them > I use some resource, and I feel no less pain
don't help them > I save some resource, maybe this can be used to decease my
expected pain

So the AI doesn't help them.

> > Even here, I don't think it's necessary. Pain is not just a data
> structure,
> > because (to over simplify) pain states are negatively valued by the goal
> > system. When a pain state is noticed, you can get thoughts like "how can
> > I reduce this pain?", you can actions taken to remove that pain.
> The idea that I am proposing is that pain (physical, mental, whatever) is
> evil not because it is valued negatively by the goal system, but because it
> feels bad to experience it. Missing the train is valued negatively by your
> goal system, of course it puts you in some kind of distress, but it has
> nothing to do with having your skin burned. The latter is much more evil,
> even if you had a way to completely reconstruct the skin afterward, and the
> momentary burning did not influence the probability of achieving your
> supergoals, it would still be more evil than missing the train (which makes
> you arrive late somewhere and MAY prevent you from achieving an important
> goal)

Here we run into the problem of humans messy 'goal system'. When I say goal
system I'm covering everything that could lead to a decision. I'm thinking
more of my anti-pain AI -- the goal system is the source of all actions.

> > my knowledge of Friendliness development. But I don't see hurting a mind
> as
> > necessary to explain why hurt is bad, and why hurt is something to work
> > towards removing.
> I tend to believe that experiencing at least moderate pain at least once is
> necessary to understand why pain is worth removing. Otherwise pain is just
> another information processing type, morally neutral.

Unless there are other ways to transfer morality. Indeed, you need to at least
transfer empathy and altruism for an AI 'feeling pain' to have the effect you

> PS: a good reply to all I have written above is "the AI should do this
> exact thinking and come up with its solution". But remember, I have qualia,
> and the AI probably won't (at least in the beginning), so it may not be
> able to formulate sentences like "are qualia needed to develop a universal
> moral framework?", no matter how smart it got.

You'd certainly want the AI to notice this, if that were the case.

> PPS: "so what is your solution?" >>> ok I will try. What about convincing
> the AI that there is something humans know that they cannot teach, that is
> fundamental to developing morals and metamorals, something that we don't
> know how to define because we don't know the physics behind it. Convince
> the AI to create the subgoal "figure out qualia" in order to satisfy the
> goal "elaborate a universal morality" in order to satisfy the supergoal
> "friendliness".

Figuring out qualia seems to be a pretty good subgoal in any case. The AI
should be able to unknown causes behind morality, qualia being one example.

- Nick

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT