Re: Friendliness and blank-slate goal bootstrap

From: Metaqualia (
Date: Sun Jan 11 2004 - 03:56:51 MST

> humans have not been confronted with such a system, or else all of us
> most likely be sharing it (unless the system ejoins its carriers to utmost

I don't go with the idea that "if a good theory existed we'd all share it".
After all, the singularity is a very good idea, and look how few people
think about it. On the contrary, absurd ideas such as religions are widely

> ones involving the statement "Death is morally neutral". Such beliefs

Actually I made a mistake when I started talking about death because it is
one of the least important, most controversial issues and was only likely to
take attention off the theory itself. Let's let this go for the moment.

> ### Nick Hay got it right: "help sentients, and others that can be helped,
> with respect to their volitions -- maximise self-determination, minimised
> unexpected regret". I would see achieving stable function along these

I know if _feels_ right, but I ask myself, is it really right? Or is it just
my evolved dislike for coercion to tell me that forcing a creature to do
anything in any way is evil?

What about some AI created by some crazy scientist, an AI whose purpose in
life is beating its head against the wall creating massive negative qualia.
Would a superior AI not have the moral duty of fixing its cognitive
framework so it could stop beating its head against the wall if it created
subjective feelings of headache? And you know a human can be seen as being
programmed to beat his head against the wall as well, can be seen as
programmed to do things that will hurt him in a way or the other. So yes
this is a topic I'd like to see discussed, should an AI force a being into
happiness or let it decide, it is by no means evident that coercion is worse
than a bleeding forehead. By discussing about this, we may even reach the
conclusion that a compromise between the two is showing the beast possible
alternatives by temporarily altering its cognitive structure, finding a way
to preserve the memories on switchback, and then letting the machine decide
by itself AFTER it has seen the alternate universe in which it is not forced
to beat its head, and perhaps AFTER it has become temporarily smarter (until
a choice has been made).

> Similarly, I do not have the emotional need to claim objective morality on
> my side if I were to act against a wife-killer - it's enough that I don't
> like it on an emotional level

I don't think I have an emotional need to claim objective morality on my
side. If anything I have a rational need to claim objective morality on my
side. But I try to ignore what feels like a need as much as possible,
because I know it's just garbage. I try to go about the good/evil problem as
I go about every other problem. If the general case can't be solved, ok, but
I'm going to try first.

What you are saying - correct me if I am wrong - sounds like you do not have
nor desire any rational explanation for WHY you want something. But in that
case, a probabilistic, logic, formal AI is going to disassemble you, find
out the exact mechanical causes that trigger your discomfort, dismiss them
as a purely deterministic process, and then ignore the wife-killer, because
it will KNOW that killing is a physical process with no further connotations
either good or bad.

Of course you can argue: we can program it so that it will hold dear my
opinions as human creator. But is that friendly or an elaborate hardcoding
of ungrounded moral values?

> No, really, you don't need to have the one and absolute Truth to reject
> acts as wrong. An appeal to shared volition, some game-theoretic

That sounds like saying, you don't have to know all that much math to prove
that 1+1=2, I like it on an emotional level so that's plenty. I'd like more
justifications when I choose between life and death, not less, than when I
choose between 2 and 3.

> ### Of course, coming up with an AI that would satisfy the moral
> of every single member of our species appears to be an impossible


> undertaking - I see the goal of FriendlyAI research as merely building a
> device which will stably perform according to the scheme summarized by
> Nick - the simplified essence of a complex system of moral reasoning
> embedded in the brains of many humans, including mine -

We have different definitions then of what kind of friendliness is
acceptable after the singularity, which explains a lot of our smaller
disagreements. There is a different between "don't do something I wouldn't
do" and "don't do something I wouldn't do if I was as smart and altruistic
as smart and altruistic can possibly get". I guess we _could_ settle
temporarily for the first case, but that's just because I think you and I
share enough moral memes that unacceptable moral wrongs will not happen even
if only your brain was scanned and incorporated into the AI. But this is
pure coincidence. If I was born in India, or on Alpha Centauri, I probably
wouldn't agree with anything your moral memes dictate. But I might still
agree that negative qualia suck; I think anyone should agree no matter what
planet they are from.

Isn't this a better way? Define ultimate good and evil in absolute terms.
Find out what path we took to get here. Set the AI on the same path, with
our certainty as interim goal. Then let it develop new theories. Become
better at morality than _we_ are, then accept the actions that will result
from a higher morality rather than impose our lower morality for strict
personal gain.

> ### It is an interesting hypothesis, postulating that a stable expression
> Friendliness (recognized as such by rational humans), requires the
> of qualia in the AI. What it has to do with objective morality eludes me.

I hope the above clarified.


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:45 MDT