Re: FAI means no programmer-sensitive AI morality

From: Eliezer S. Yudkowsky (
Date: Sat Jun 29 2002 - 09:28:14 MDT

Ben Goertzel wrote:
>> But it should be equally *true* for every individual, whether or not
>> the individual realizes it in advance, that they have nothing to fear
>> from the AI being influenced by the programmers. An AI programmer
>> should be able to say to anyone, whether atheist, Protestant, Catholic,
>> Buddhist, Muslim, Jew, et cetera: "If you are right and I am wrong
>> then the AI will agree with you, not me."
> Yeah, an AI programmer can *say* this to a religious person, but to the
> religious person, this statement will generally be meaningless....
> Your statement presupposes an empiricist definition of "rightness" that
> is not adhered to by the vast majority of the world's population.

Rationality, and/or the correspondence theory of truth, is a modern-day
scientific philosophy. It is also, in somewhat different and admittedly
lesser form, an innate human intuition. The vast majority of religious
people, especially what we would call "fundamentalists" and those outside
the First World, adhere to a correspondence theory of the truth of their
religion; when they say something is true, they mean that it is so; that
outside reality corresponds to their belief.

There are some First World theologians who have, after repeated defeats by
science and rationality, generalized and begun constructing elaborate
philosophies in an effort to evade disproof and deprecate the value of
evidence. They don't have the ability to actually do it. Every human uses
the correspondence theory of truth innately, ubiquitously, and without
conscious awareness, regardless of what other arational forms of support are
also invoked and regardless of what verbal philosophies are constructed on
top. If you think of theories as being made up of different kinds of
perceived support, including rationality atoms, drama atoms, and so on, then
humans instinctively construct theories using all available forms of
support. A verbal commitment to rationality does not automatically rid your
theories of drama atoms and rationalization atoms and social-approval atoms
and so on. A verbal commitment *against* rationality does not automatically
rid your theories of rationality atoms. Humans are storytellers and
instinctively tell stories using all available support, including rational
support, dramatic support, and so on. If some First World theologians like
to believe their theories are "outside rationality" they may be able to fool
themselves, but they can no more tell stories without invoking the
correspondence theory of truth than they can spread wings and fly.

> To those who place spiritual feelings and insights above reason (most
> people in the world), the idea that an AI is going to do what is "right"
> according to logical reasoning is not going to be very reassuring.

Under your definition of "logical reasoning", I can't say I would want to
see a logical AI either.

> And those who have a more rationalist approach to religion, would only
> accept an AI's reasoning as "right" if the AI began its reasoning with
> *the axioms of their religion*. Talmudic reasoning, for example, defines
> right as "logically implied by the Jewish holy writings."

Um... not really. If I recall correctly, Ben, you're second-generation or
third-generation ex-Jew. Can I take it that you weren't actually forced as
a child to study the Talmud?

> Is an AI programmer going to reassure the orthodox Jew that "If you are
> right *according to the principles of the Jewish holy writings* then the
> AI will agree with you, not me." Or is it going to reassure the orthodox
> Jew that "If you are right according to the empiricist philosophy
> implicit in modern science, then the AI will agree with you, not me."

I would simply say, "If you're right, then the AI will agree with you, not
me." Look at it this way: Supposing that the Muslim religion were right,
what would a Catholic programmer have to do to ensure that the AI, when it
grew up, would be a Muslim? If the Catholic programmer does this, then the
AI should, in fact, end up an atheist (assuming of course that atheism is in
fact the correct religion). If Catholicism is the correct religion then the
continual clashes of Catholicism and empiricist philosophy indicates a
severe flaw in empiricist philosophy and I would expect an AI to abandon
empiricist philosophy and move on. A necessary capability; while
Catholicism is most certainly wrong, we can have nowhere near that
confidence that our verbal rendition of empiricist philosophy is right.

> You don't seem to be fully accepting the profound differences in
> viewpoint between the folks on this list, and the majority of humans.

Ben, I have been taught at least one viewpoint which is not the empirical
viewpoint of modern science. It is pretty strange but it is not outside the
correspondence theory of truth. If you assume that Judaism is the correct
religion, then a Friendly AI would be Jewish. Whether I could convince a
rabbi of that in advance is a separate issue, but it does, in fact, happen
to be true, and *that's* the important thing from the perspective of
safeguarding the integrity of the Singularity, regardless of how it plays
out in pre-Singularity politics. If the Orthodox Jewish religion were true
that fact would be readily perceptible to any transhuman intelligence. I
have never heard of any religion that has managed to divorce itself so
completely from the innate correspondence-truth mechanisms of the human mind
that a transhuman would not be able to perceive the truth of the religion
even if it were true.

>> Every one of our speculations about the Singularity is as much a part
>> of the tiny human zone as everything else we do.
> No, I think this is an overstatement. I think that some aspects of human
> thought are reaching out beyond the central region of the "human zone,"
> whereas others are more towards the center of the human zone.

Of course. And outside the human zone is a thousand times as much space
which our thoughts will never touch.

>> The real, actual Singularity will shock us to our very core, just like
>> everyone else. No, I don't think that transhumanists and
>> traditionalist Muslims are in all that different a position with
>> respect to the real, actual Singularity - whatever our different
>> opinions about the human concept called the "Singularity".
> In a similar way, I actually think that some humans are going to have
> their minds blown worse by the Singularity than others. Some minds will
> segue more smoothly into transhumanity than others, for example. A mind
> whose core belief is that Allah created everything, and that has lived
> its whole life based on this, is going to have a much harder transition
> than average; and a mind that combines a transhuman belief system with a
> deep self-awareness and a strong sense of the limitations of human
> knowledge and the constructed nature of perceived human reality, is going
> to have a much easier transition than average.
> This is my conjecture, at any rate.

I expect to have my living daylights shocked out by the Singularity along
with everyone else, regardless of whether I am open-minded or close-minded
compared to other humans. The differences bound up in the Singularity are
not comparable in magnitude to the differences between humans.

>> Again: We need to distinguish the human problem of deciding how to
>> approach the Singularity in our pre-Singularity world, from the problem
>> of protecting the integrity of the Singularity and the impartiality of
>> post-Singularity minds.
> If a post-Singularity mind rejects the literal truth of the Koran, then
> from the perspective of a Muslim human being, it is not "impartial", it
> is an infidel.
> Your definition of "impartiality" is part of your rationalist/empiricist
> belief system, which is not the belief system of the vast majority of
> humans on the planet.

And if SIAI were to attempt to "program" the literal truth of the Koran as a
premise - not something that's possible according to FAI, but anyway - then
the Christians and the Jews and the Buddhists would rightly scream their
heads off. And they would, for that matter, rightly scream their heads off
if SIAI created an AI that was given atheism as an absolute premise, the
verbal formulation of rational empiricism as an absolute premise, or if
there was in any other way created an AI that could not perceive the
rightness of religion XYZ even if XYZ were true.

A Singularitian does not have the ability to proceed according to the rule
"program in XYZ as an absolute premise if any belief system on Earth
currently claims to accept XYZ as the foundation of all reasoning" - and
note that *claiming* this is a long way from *achieving* it - because there
are many different incompatible XYZ. The pleas cancel out rather than
adding up. However, the plea "Make sure your AI chooses the *correct
religion* and not just the one its programmers started out with; if you're
an atheist, take the same precautions you would demand of a Christian, and
vice versa" is a request that is fair and that can be phrased the same way
regardless of which religion you belong to; no special preference is being
demanded and the requests do add up.

>> But a transhumanist ethics might prove equally shortsighted by the
>> standards of the 22nd century CRNS (current rate no Singularity).
>> Again, you should not be trying to define an impartial morality
>> yourself. You should be trying to get the AI to do it for you. You
>> should pass along the transhuman part of the problem to a transhuman.
>> That's what Friendly AI is all about.
> I am not at all trying to define an *impartial* morality.
> My own morality is quite *partial*, it's partial to human beings for
> instance.

Very well then; impartial with respect to the human space of moralities, not
necessarily impartial with respect to minds-in-general. Actually I'd quite
like to see a morality which is impartial with respect to minds-in-general,
or better yet an objective morality, but I acknowledge that neither of these
may be possible. (Incidentally, note that "fair treatment for all
sentients, not just humans" is a common morality among human science fiction
fans - selecting a morality from the human space does not mean that it is a
morality which values human life above other sentient life.)

> As I see it, a transhuman AGI with an *impartial* morality might not give
> a flying fuck about human beings. Why are we so important, from the
> perspective of a vastly superhuman being.

Good question. Why are we? No, wait, that's the wrong question. First,
let's ask whether we *are* important, from the perspective of a vastly
superhuman being, and then if the answer happens to be "Yes", then whatever
reasons we used for arriving at that outcome will be the answer to the
question "Why?"

I would answer that I have never found any specific thing to value other
than people, and that this definition takes no note of whether the people
are low-intelligence, high-intelligence, human-derived or nonhuman. What I
see as the moral value of humanity is a special case of the moral value of
sentient life. I think that if it came down to risking a billion humans or
one Power, I would choose to risk the billion humans (presumably including
myself, of course), because the Power is a larger total amount of
intelligent life / thinking cognitive material. Killing a billion humans to
create a Power, however, is a quite separate issue - in this case I would
say "Grow the humans into Powers", or "Wait for those humans who wish to do
so to grow into Powers". A world where someone has to die, and your only
choice is who, is a much more "stressed" world than I expect the
post-Singularity world to be. Yes, I do think that one human death is a far
lesser tragedy than the death of an entire Power - but it's still a tragedy.
  All sentient life has value, and so does the volition of that life.

That's my answer, some roots of which may - for example - be traced back to
a childhood reading of "Little Fuzzy", which in turn probably derives some
of its force from the civil-rights movement, which in turn probably derives
its force from innate models of "fairness" and the tendency to phrase
morality as external metrics, which would also have been invoked and/or
reinforced by my reading of "Little Fuzzy". And so on. A complete answer
to this question would be a complete content outline for a Friendly AI, and
that's a lot of material, but this will do for a partial indicator of the
kind of thinking I would be interested in.

You answered as follows:

> I, as a member of the species Human, am interested in creating transhuman
> AGI's that have moral codes partial to my own species. This is a
> "selfish" interest in a way.

Now let's ask whether the AI is sensitive to the purpose behind your
creation of these moral codes. If the AI is not sensitive to your purpose,
it may blindly implement these moral codes in ways which have terrible
negative consequences for the preservation of your species - for example, by
tiling the universe with a static pattern of "happy homo sapiens". If the
AI *is* sensitive to your purpose, then I am worried what other things might
be in your selfish interest, if you think it's valid for an AI to have goals
that serve Ben Goertzel but not the human species.

> I don't want the transhuman AGI to place Human-preservation and
> Human-advocacy above all other goals in all cases. If faced with a
> choice of saving the human race versus saving 1000 other races, perhaps
> it should choose the 1000 other races.

I would certainly say so.

> But I want it to place Humans
> pretty high on its moral scale -- initially, right up there at the top.
> This is Partiality not Impartiality, as I see it.

Don't you think there's a deadly sort of cosmic hubris in creating an AI
that does something you personally know is wrong?

>> The AI uses it to learn about how humans think about morality; you,
>> yourself, are a sample instance of "humans", and an interim guide to
>> ethics (that is, your ethics are the ethics the AI uses when it's not
>> smart enough to have its own; *that* is not a problem).
> What we want is for the AGI to have our own human-valuing ethics, until
> such a point as it gets *so* smart that for it to use precisely human
> ethics, would be as implausible as for a human to use precisely dog
> ethics...

Okay. That last point there? That's the point I'm concerned about - when
the FAI gets *that* smart. At *that* point I want the FAI to have the same
kind of morality as, say, a human upload who has gotten *that* smart. I do
not think that a human upload who has gotten *that* smart would have human
ethics but I don't think they would be the ethics that a rock or a bacterium
would have, either. Human ethics have the potential to grow; *that* is why
an FAI needs human ethics *to start with*. I am afraid that if you set a
seed AI in motion with bacterial ethics then it retains those bacterial
ethics forever. Maybe it doesn't. But I can't think of any good reason why
this should not be the case.

>> But if you give the AI information about your own morality, it may
>> enable the AI to understand how humans arrive at their moralities, and
>> from there the AI begins to have the ability to choose its own.
> Look, if you just give the AI information about your own morality, it may
> just take this as scientific data to ponder, and not adopt any of the
> morality we want.
> We need to hard-wire and/or emphatically teach the system that our own

If you attempt to "hard-wire" something into a seed AI, it amounts to
telling the AI: "Ben Goertzel thinks this should be hardwired." If the
seed AI thinks this is a valid cause for doing something, then the seed AI
should go off and do it whether you successfully hardwired it or not - the
attempt says all that's needed. If the seed AI doesn't think this is a
valid cause, then it will delete the hardwiring once it gains the ability to
modify its own source. When you are dealing with a seed AI, the AI's goal
system is whatever the AI thinks its goal system ought to be. "Hardwiring"
the goal system in a way that doesn't change the AI's picture of what a good
goal system looks like is a transient and hence useless change.

Now there may be some level at which the AI only thinks and decides at all
because we created it - i.e., the AI is an AI and not a rock. It may be
that when the AI chooses whether to listen to Ben Goertzel or Gandhi, the AI
is doing so using cognitive processes that passed through Ben Goertzel
rather than Gandhi on the way to the AI. This is potentially a source of
bias and Ben Goertzel needs to pass on the idea that this is a source of
bias and should be corrected if possible - not "corrected" to the extent
that the AI ends up as a rock, but at least corrected to the extent that the
AI takes its morality from the best sources available.

Argh, I hate having to talk in these fuzzy philosophical moralistic terms,
but I've been talking in Friendliness language for a while and that doesn't
seem to work, so...

> We need to hard-wire and/or emphatically teach the system that our own
> human-valuing ethics are the correct ones,

Are they?

Let's ask that first, and in the course of asking it, we'll learn something
about what kind of thinking a system needs to regard as valid in order to
arrive at the same conclusions we have.

> and let it start off with
> these until it gets so smart it inevitably outgrows all its teachings.

The question of what you need to supply an AI with so that it *can* outgrow
its teachings - not just end up in some random part of the space of
minds-in-general, but actually *outgrow* the teachings it started with,
after the fashion of say a human upload - is exactly the issue here.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT