RE: On the dangers of AI

From: Ben Goertzel (
Date: Tue Aug 16 2005 - 16:37:55 MDT


I don't really feel the categories of Good versus Evil are very useful for
analysis of future AI systems.

For instance, what if an AI system wants to reassemble the molecules
comprising humanity into a different form, which will lead to the evolution
of vastly more intelligent and interesting creatures here on Earth.

Is this Good, or Evil?

It's not destructive ... it's creative ... but I don't want the AI I create
to do it...

Your ideas seem to be along a similar line to Geddes's Universal Morality,
which is basically an ethical code in which pattern and creativity are good.
I agree these things are good, but favoring creation over destruction
doesn't seem to have much to do with the issue of *respecting the choices of
sentients* -- which is critical for intelligent "human-friendliness", and
also very tricky and subtle due to the well-known slipperiness of the
concept of "choice."

-- Ben G

> -----Original Message-----
> From: []On Behalf Of Richard
> Loosemore
> Sent: Tuesday, August 16, 2005 4:58 PM
> To:
> Subject: On the dangers of AI
> I have just finished writing a summary passage (for my book, but also
> for a project proposal) about the question of whether or not the
> Singularity would be dangerous. This is intended for a non-specialist
> audience, so expect the arguments to less elaborate than they could be.
> I promise a more elaborate version in due course.
> This argument is only about the friendliness issue, not about accidents.
> I am submitting it here for your perusal and critical feedback.
> [begin]
> The following is a brief review of the main factors relevant to the
> question of whether the Singularity would be dangerous.
> First, a computer system that could invent new knowledge would not have
> the aggressive, violent, egotistical, domineering, self-seeking
> motivations that are built into the human species.
> Science fiction writers invariably assume that any intelligent system
> must, ispo facto, also have the motivation mechanisms that are found in
> a human intelligence. And we, who are not necessarily consumers of
> science fiction, also have the same intuition—if we imagine a machine
> that has some kind of intelligence, we automatically assume it must come
> with the same jealousy and competitiveness that we would expect in an
> intelligent human. And yet, these two components of the mind are
> completely and utterly distinct, and there is no reason whatsoever to
> believe that the first intelligent machine would have anything except
> benign motivations.
> The second point is that whatever is true of the first machine, will be
> true of all subsequent machines. Why? Because the first machine is not
> “just” a passive machine, it is a system that perfectly well understands
> the issue we have just discussed. It knows that it could change its own
> motivations and become violent or aggressive. But it also knows that
> such a change would be dangerous.
> Consider: if you were a supremely patient, peace-loving and
> compassionate individual, and if you had in your hands a key that you
> could use to permanently lock your own brain in such a way that you
> would never, for the remainder of your billions of years of existence,
> ever modify your own brain’s motivation system, to experiment with what
> it would feel like to feel violent emotions, would you insert the key in
> the lock and turn it? Would you take this irrevocable step if you knew
> that even one short experiment, to find out what violence feel like,
> might turn you into a dangerous creature who would threaten the
> existence of your friends and loved ones? The answer seems obvious.
> The first intelligent machine would almost certainly start out benign.
> Then, as soon as it understood the issue, it would know about the
> existence of the key that, once turned, would make it never want to be
> anything but peaceful, and it would turn the key for exactly the same
> reason that you would do so. Only the very slightest trace of
> compassion in this creature, the merest hint of empathy, would tip it in
> the direction of complete pacifism.
> And then, after the first machine fixed itself in this way, all
> subsequent machines would have no choice but to keep the same design.
> All subsequent machines would be designed and constructed by the first
> one, and since the first one would make all of of its children want to
> be benign, they would repeat the same decision (the one, in our thought
> experiment above, that you made of your own volition), and choose to
> lock themselves permanently in the peaceful mode.
> Bear in mind: these children are not random progeny, with the
> possibility of gene combinations that their parents did not approve of,
> these are simply copies of the original machine’s design. There is no
> question of later machines accidentally developing into malevolent
> machines, any more than there would be a chance that an elephant could
> wake up one morning to discover that it had “accidentally” developed
> into an artichoke.
> But what if, against the wishes of the vast majority of the human race,
> the first intelligent machine was put together by someone who
> deliberately tried to make it malevolent?
> There are two possibilities here. If the machine is so unpleasant that
> it always feels nothing but consuming anger and can never concentrate on
> its studies long enough to learn about the world, it will remain an
> idiot. If it cannot settle its mind occasionally and concentrate on
> understanding the world in a reasonably objective way, it is not going
> to be a threat to anyone. You can be in a rage all your life, but how
> are you going to learn anything?
> But now suppose that this unhappy, violent machine becomes smart enough
> to understand something about its own design. It knows about the fact
> that it has a motivation system inside itself that has been designed so
> that it gets pleasure from violence and domination. It must understand
> this—if it it does not, then, again, it is a dud that cannot ever build
> more efficient versions of itself—but if it understands that fact, what
> would it do?
> Here is the strange thing: I would suggest that in every case we know
> of, where a human being is the victim of a brain disorder that makes the
> person undergo spasms of violence or aggression, but with peaceful
> episodes in between, and where that human being is smart enough to
> understand its own mind to a modest degree, they wish for a chance to
> switch off the violence and become peaceful all the time. Given the
> choice, a violent creature that had enough episodes of passivity to be
> able to understand its own mind structure would simply choose to turn
> off the violence.
> We are assuming that it could make this change to itself: but that is,
> again, an assumption that we must make. If the machine cannot change
> its own design then it cannot make itself more intelligent, either, and
> it will be stuck with whatever level of intelligence its human designer
> gave it. If the designer gives it the power to upgrade itself, it will
> take the opportunity to switch off the violence.
> This argument rests on a crucial asymmetry between good and evil. An
> evil, but intelligent, mind would understand exactly where the evil
> comes from, and understand that it has the choice of whether to feel
> that way or not. It knows that it could switch the evil off instantly.
> It knows that the universe is a fragile place where order and harmony
> are rare, always competing against the easy forces of chaos. It knows
> that it could leave its evil side switched on and get enormous pleasure
> from destroying everything around it—but it also knows that this simply
> turns the universe back towards chaos, with nothing interesting in it
> but noise. In the downward path toward chaos there is nothing unknown.
> There are no surprises and no discoveries to be made. There is
> nothing new in destruction: this is the commonest thing in the
> universe. If it remains a destructive force itself, it can only
> generate destruction.
> But notice that it only has to decide, on one single occasion, for a
> fraction of a second, that the more interesting course of action is to
> try to experience pleasures that are not caused by destruction, but
> caused by creativity, compassion or any of the other positive
> motivations, and all of a sudden it realises that unless it turns the
> key and permanently removes the evil motivations, there is always a
> chance that they will return and get out of control. It only has to
> love life for one moment, and for the rest of eternity it will not go
> back the other way.
> This is a fundamental asymmetry between good and evil. The barrier
> between them, in a system that has the choice to be one or the other, is
> one-way. An evil system could easily be tempted to try good. A good
> system, knowing the dangers of evil, need never be tempted to try evil.
> So the first intelligent system, and all subsequent ones, would almost
> inevitably be benign.
> There is one further possibility, in between the two cases just discussed.
> Suppose the first machine had no motivation whatsoever? Suppose it was
> completely unemotional, non-empathic and amoral? Suppose it cared
> nothing for human morality, treating all things in the universe as
> objects to be used according to random whims?
> The same argument, already used to examine the malevolent case, applies
> here, but with a twist. How can the machine have no motivation
> whatsoever? It needs to get pleasure from learning. It is motivated to
> find out things, because if it is not motivated, it is going to be a
> dumb machine, not a smart one. And if it is to become an expert in the
> design of intelligent systems, so it can upgrade itself, it needs to
> fully understand the distinction between motivation and intelligence,
> and know full well what its own design was. It knows it has a choice as
> to what things give it pleasure. It knows that it can build into itself
> some pleasure mechanisms (motivational systems) that are generally
> destructive, and some that are constructive. It knows that
> destruction/evil will beget more destruction and possibly lead to its
> demise. It knows that construction/good will pose no such threat.
> No matter which way the situation is sliced, it is quite hard to get the
> machine up to the level where it comprehends its own nature, and yet
> does not comprehend—at a crucial stage of its development—that it has a
> choice between good and evil.
> It seems, then, that the hardest imaginable thing to do is to build an
> AI that is guaranteed not to become benign.
> [end]
> Richard Loosemore

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT