Re: On the dangers of AI

From: Richard Loosemore (
Date: Tue Aug 16 2005 - 20:54:43 MDT


This is very tricky territory, I believe, so I am going to try to go
through what you say very carefully....

Peter de Blanc wrote:
> On Tue, 2005-08-16 at 16:57 -0400, Richard Loosemore wrote:
>>Here is the strange thing: I would suggest that in every case we know
>>of, where a human being is the victim of a brain disorder that makes the
>>person undergo spasms of violence or aggression, but with peaceful
>>episodes in between, and where that human being is smart enough to
>>understand its own mind to a modest degree, they wish for a chance to
>>switch off the violence and become peaceful all the time. Given the
>>choice, a violent creature that had enough episodes of passivity to be
>>able to understand its own mind structure would simply choose to turn
>>off the violence
> There's an important distinction which you're missing, between a mind's
> behaviors and (its beliefs about) its goal content. As human beings, we
> have evolved to believe that we are altruists, and when our evolved
> instincts and behaviors contradict this, we can sometimes alter these
> behaviors.
> In other words, it is a reproductive advantage to have selfish
> behaviors, so you have them, but it is also a reproductive advantage to
> think of yourself as an altruist, so you do. Fortunately, your
> generally-intelligent mind is more powerful than these dumb instincts,
> so you have the ability to overcome them, and become a genuinely good
> person. But you can only do this because you _started out_ wanting to be
> a good person!

I want to argue first that I am not missing the distinction between a
mind's behaviors and (its beliefs about) its goal content.

First, note that human beings are pretty dodgy when it comes to their
*beliefs* about their own motivations: self-knowledge of motivations in
individual humans ranges from close to zero (my 97 year old grandmother
with dementia) through adeptly contortionist (my delightful but
sometimes exasperating 6-year old son) to grossly distorted (Hitler, who
probably thought of himself as doing wonderful things for the world) and
   on through sublimely subtle (T.E. Lawrence? Bertrand Russell?).
Truth is, we have evolved to play all kinds of tricks on ourselves, and
to have many levels of depth of understanding, depending on who we are
and how hard we try.

Most of us are very imperfect at it, but (and this is an important
point) the more we try to study motivation objectively, and internally
observe what happens inside ourselves, the better, I claim, we become.

So, yes, part of the story is that we have evolved to think of ourselves
as altruists - or rather, as altruists with respect to our kinsfolk and
relatives, but often not global altruists. And when our instincts
contradict our perceptions of who we think we *should* be, we can
sometimes modify the instincts. The full picture involves quite a
tangled web of interacting forces, but yes, this central conflict is
part of the story, as you point out.

So far, so good. To be old-fashioned about it, Superego clobbers Id
when it gets out of control, and we end up becoming "a genuinely good

But now, if I read you aright, what you are saying is that the reason
Superego gets the upper hand in the end is that the system was designed
with fundamental altruism as goal number one ("you can only do this
because you _started out_ wanting to be a good person!") and because
this goal was designed in from the beginning, this is the reason why it
eventually (at least in the case of nice people like you and I)
triumphed over the baser instincts.

Hence, it depends on what was goal number one in the initial design of
the system (altruism, rather than ruthless dominance or paperclipization
of the known universe). Whatever was in there first, wins?

I have two serious disputes with this.

1) Are you sure? I mean are you sure that the reason why this
complicated (in fact probably Complex) motivation system, which is more
than just an opponent-process module involving Supergo and Id, but is,
as I argued above, a tangled mess of forces, is ending up the way it
does by the time it matures, *only* because the altruism was goal number
one in the initial design? I am really not so sure, myself, and either
way, this is something that we should be answering empirically - I don't
think you and I could decide the reasons for its eventually settling on
good behavior without some serious psychological studies and
(preferably) some simulations of different kinds of motivation systems.

2) Quite apart from that last question, though, I believe that you have
introduced something of a red herring, because all of the above
discussion is about ordinary people and their motivational systems, and
about their introspective awareness of those systems, and the
interaction betwixt motivation and introspection.

In my original essay, though, I was talking not about ordinary humans,
but about creatures who, ex hypothesi, have quite a deep understanding
of motivation systems in minds ... and, on top of that understanding
they have the ability to flip switches that can turn parts of their own
motivation systems on or off. My point is that we rarely talk about the
the kind of human that has a profoundly deep and subtle understanding of
how their own motivation systems is structured (there just aren't that
many of them), but this is the population of most interest in the essay.
  So when you correctly point out that all sorts of strange forces come
together to determine the overall niceness of a typical human, you are
tempting me off topic!

Having said all this, I can now meet your last point:

> You are anthropomorphizing by assuming that these beliefs about goal
> content are held by minds-in-general, and the only variation is in the
> instinctual behaviors built in to different minds. A Seed AI which
> believes its goal to be paper clip maximization will not find
> Friendliness seductive! It will think about Friendliness and say "Uh oh!
> Being Friendly would prevent me from turning the universe into paper
> clips! I'd better not be Friendly."

Wait! Anthropomorphizing is when we incorrectly assume that a thing is
like a human being.

What you are saying in this paragraph is that (1) my original argument
was that niceness tends to triumph in humans, (2) I misunderstood the
fact that this actually occurs because of our particular beliefs about
our goal content (the altruism stuff, above), and (3) continuing this
misunderstanding, I falsely generalized and assumed that all minds would
have the same beliefs about their goal content (?... I am a little
unclear about your argument here...).

No, not at all! I am saying that a sufficiently smart mind would
transcend the mere beliefs-about-goals stuff and realise that it is a
system comprising two things: a motivational system whose structure
determines what gives it pleasure, and an intelligence system.

So I think that what you yourself have done is to hit up against the
anthropomorphization problem, thus:

> A Seed AI which
> believes its goal to be paper clip maximization will

wait! why would it be so impoverished in its understanding of
motivation systems, that it just "believes its goal to do [x]" and
confuses this with the last word on what pushes its buttons? Would it
not have a much deeper understanding, and say "I feel this urge to
paperclipize, but I know it's just a quirk of my motivation system, so,
let's see, is this sensible? Do I have any other choices here?"

If you assume that it only has the not-very-introspective human-level
understanding of its motivation, then this is anthropomorphism, surely?
  (It's a bit of a turnabout, for sure, since anthropomorphism usually
means accidentally assuming too much intelligence in an inanimate
object, whereas here we got caught assuming too little in a

To illustrate: I don't "believe my goal is to have wild sex." I just
jolly well *like* doing it! Moreover, I'm sophisticated enough to know
that I have a quirky little motivation system down there in my brain,
and it is modifiable (though not by me, not yet).

Bottom Line:

It is all about there being a threshold level of understanding of
motivation systems, coupled with the ability to flip switches in ones
own system, above which the mind will behave very, very differently than
your standard model human.

Hope I didn't beat you about the head too much with this reply! These
arguments are damn difficult to squeeze into email-sized chunks. Entire
chapters, or entire books, would be better.

Richard Loosemore.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT