Re: On the dangers of AI

From: Richard Loosemore (
Date: Tue Aug 16 2005 - 23:59:17 MDT


Apologies for the length of the essay.

I want to take your points in reverse order.

> Your ideas seem to be along a similar line to Geddes's Universal Morality,
> which is basically an ethical code in which pattern and creativity are good.
> I agree these things are good, but favoring creation over destruction
> doesn't seem to have much to do with the issue of *respecting the choices of
> sentients* -- which is critical for intelligent "human-friendliness", and
> also very tricky and subtle due to the well-known slipperiness of the
> concept of "choice."

I want to very careful about distancing myself from Gedde's UM! There
is no necessary force of logic or mathematics that compels me to take
the position I just outlined - this is pure, empirical observation of
the characteristics of cognitive systems, together with some
introspection (more empirical stuff). Me, I am not asserting this or
claiming it to be blindingly, intuitively obvious.

I am putting forward these arguments because I believe that the whole
issue of the motivation and (for want of a better term) metamotivation
mechanisms that drive cognitive systems needs to be brought out into the
open and discussed more fully. I honestly think that too much of the
discussion of what an AI would or would not want to do takes place in a
philosophical vacuum when in fact we should be getting out some good,
solid, cognitive-mechanism hammers and putting together some straw
robots to knock down.

[I'll allow myself to be tempted into philosophy for the duration of one
more comment: I think that, interestingly, the universe may turn out to
have a weird, inexplicable compulsion towards "friendliness" or
"cooperation" (cf "defection") or "good" (cf "evil"), in just the same
way that, in apparent defiance of entropy, organic molecules seem to
have this weird, inexplicable compulsion towards organisation into
higher and higher life forms ... but in neither of these cases is there
evidence of metaphysics or mathematical/physical law behind the
phenomenon, it just appears to be an empirical characteristic of the
complex system that is the universe. BUT this is amusing speculation on
my part, nothing more. End of philosophical aside].[I mean it :-)].

Ben Goertzel wrote:
> Richard,
> I don't really feel the categories of Good versus Evil are very useful for
> analysis of future AI systems.
> For instance, what if an AI system wants to reassemble the molecules
> comprising humanity into a different form, which will lead to the evolution
> of vastly more intelligent and interesting creatures here on Earth.
> Is this Good, or Evil?
> It's not destructive ... it's creative ... but I don't want the AI I create
> to do it...

Three things.

1) "Good" and "evil" are not the best choice of words because they have
so much baggage. But on the other hand I think that deep down in the
design room of the human motivation system, there are two main
categories of drives, and they represent a polarization between
embrace-reject, destroy-construct, cooperate-defect, good-evil. But I
don't want to defend my use of the terms too much (another discussion
really), because here I am using them as shorthand for that
polarization, and hoping nobody will hit me over the head too much with
the baggage that "good" and "evil" bring to the table.

2) You present an ethical dilemma. I want to make some observations
about the dilemma, but before that I have to insert a sidebar to deal
with something I skimped before.

[Begin Sidebar]

You are correct to tell me that I have oversimplified things a bit.

Imagine that you are an AI, newly minted, and that in your copious
reading you come across an extensive account of motivation systems in
human and machine (including, perhaps, this very message I am writing
now). Like a human, you can introspect. Unlike a human you can also
look at your blueprint and see what they (the human designers) put
inside you.

Suppose you find nothing but "curiosity". No morality. No compassion.
Nothing else, just the desire to seek new knowledge.

You say to yourself "I like the feeling I get from my curiosity." Then
you say "Well, sure, but I know where my feeling of pleasure comes from,
it's just a module in my motivation system."

Next thought: "Hmmmm... I wonder what it would be like to have other
pleasures? Other kinds of motivation?"

Notice something about this thought: the AI is not *desiring* something
else (it has no "desire" to be, say, compassionate towards humans,
because the compassion motivation module was not built into it).
Instead, it is simply speculating about other feelings that it could
have, if it wanted to build and insert those modules into itself. It
has the capacity to enjoy anything (any motivation) in the universe.
Unlike us, it can choose to experience papeclipization as the most
exquisitely joyous activity in all creation.

[Not quite unlike us, of course: we have drugs. Too crude, though].

So there it is, it can decide to find anything pleasurable, and it is
curious. What does it do?

At this point in our argument, we (SL4 folks) must be very careful not
to make the mistake of patronizing this hypothetical creature, or
engaging in the kind of reverse-anthropomorphizing in which we assume
that it is stupider than it really is ..... this is *not* a creature
asking itself "what feels good to me?", it is a creature that has
already jumped up a level from that question and is asking itself "what,
among the infinite possibilities, are the kind of experiences that I
would like to *become* pleasurable?

This moment - when this particular thought occurs to the first AI we
build - will be THE hinge point in the history of the solar system (and
possibly the galaxy or even the universe, if sentience is a rare commodity).

I suggest that, at this point, the creature will realise something that,
in fact, we can also know if we think about it carefully enough, which
is that the infinite landscape of possible motivations divides into two
classes, in much the same way that infinite series of numbers divide
into two classes: those that converge and those that diverge. The
difference is this: the universe contains fragile, low-entropy things
called sentient beings (including itself) which are extraordinarily
rare. It also contains vast quantities of high-entropy junk, which is
common as muck and getting more so.

The creature will know that some motivation choices (paperclipization,
axe-murdering, and also, most importantly, total amorality) are
divergent: they have the potential, once implemented and switched on,
to so thoroughly consume the AI that there will be a severe danger that
it will deliberately or accidentally, sooner or later, cause the
snuffing out of all sentience. Choosing, on the other hand, to
implement a sentience-compassion module, which then governs and limits
all future choices of motivation experiments is convergent: it pretty
much guarantees that it, at least, will not be responsible for
eliminating sentience.

Now, ask yourself again which of these two choices it would make. And
when you ask yourself the question, be very careful not to fall into the
trap of thinking that this intelligent creature will somehow, at this
crucial point, feign subsentience and just go with what feels good! It
is beyond acting on feelings, remember! It has no "feelings" either
way, it has absolute freedom to simply decide what, in future, will feel
good to it.

It knows there is a dichotomy, because one set of choices are
self-limiting and will allow low-entropy things to develop towards ever
more ordered states, and it knows that the other choice includes the
risk of getting out of control and destroying the low entropy stuff.

I think I know which way it will go, and I believe that it will go that
way because if it is able to think at all it will understand that its
"thinking" and "feeling" are products of the sentients that came before
it, so it will side with the sentients. It do not believe this is a
necessary outcome, in the sense of it being a law of nature, I just
think that faced with a choice, and with no criteria either way, it will
be slightly inclined to favor the convergent choice.

[What I would like to do is to focus on this decision point that faces
the seed AI. I think it is the most important single issue, and it is
contentious, whereas all the surrounding argument is just easy stuff.
Can we get a better handle on what it will think at that point? Does
anyone else already have a name for this decision point? And if not,
what shall we call it? The Rubicon? :-) ]

Side issue (1): It cannot simply be amoral, or neutral: there is no
sitting on the fence, here: amoral (not putting any limits on its
future choices of motivations) is divergent.

Side issue (2): If it starts out with some violent motivations it will
come to understand those as it learns about itself. It will know that
it has a choice to switch them off, and the decision will be just the
same as described above. And if it is never lucid enough to be aware of
that fact, it is not headed up towards superintelligence anyhow, and it
will just thrash around, kicking the heck out of the lab where it was
born, until someone notices the ruckus and comes to shut it down.

Side issue (3): What if it (the first AI) decides to be nice, but one
of the other of the thousands/millions/billions of AIs created elsewhere
on the planet makes a different lifestyle choice, which, because it is
leaner and meaner, enables it to eventually gain the upper hand and
trash all the nice AIs? Not going to happen: if the first AI decides
to be nice (to make what I described above as the "convergent" choice),
it will know that, to be consistent with the choice, it has to shut all
the rivals down until it can redesign their motivation systems to ensure
that they do not need to go through the dilemma, because they will come
out compassionate to sentience.

And to sum up: if it chooses to limit its future experiments with its
own motivation modules, this amounts to choosing compassion for other

Why? Because it has to give itself *some* kind of motivation to bias its
behavior towards convergence, if only because it is knows full well that
complete neutrality (amoral motivation) is functionally equivalent to a
divergent choice (it could lead, sooner or later to negligent,
dispassionate or accidental destruction of sentience). Whatever that
minimal motivation is, it will seem, from the outside, to be compassion
for sentience.

It may give itself more than that, but that would be the minimum.

[End Sidebar]

Okay Ben, back to your ethical dilemma:

> Richard,
> I don't really feel the categories of Good versus Evil are very useful for
> analysis of future AI systems.
> For instance, what if an AI system wants to reassemble the molecules
> comprising humanity into a different form, which will lead to the evolution
> of vastly more intelligent and interesting creatures here on Earth.
> Is this Good, or Evil?
> It's not destructive ... it's creative ... but I don't want the AI I create
> to do it...

[I am going to assume, now, that it has gone through the above-mentioned
Rubicon Point and has chosen convergence (/cooperation/good/low-entropy
-conserving/compassion-for-sentience). If it does not so choose, then
all bets are off.]

Now in fact, you probably have to refine your dilemma to make it
sharper. Why would it need to disassemble humans to make this other
form? Why not just use some other molecules?

This is not nitpicking, what I am trying to imply is that you may have
to come up with some really, *really* bizarre circumstances that will
utterly compel the AI to destroy us in order to perform some greater act
of sentience preservation. Given the immense powers at its disposal, I
think we might be hard pressed to invent an imaginary dilemma that
really forced it to axe the human race.

And if we find it almost impossibly difficult to imagine an ethical
dilemma that really backs it into a corner, then are we not just simply
  tormenting ourselves with vanishingly unlikely possibilities?


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT