Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)

Date: Tue Jun 06 2006 - 17:00:07 MDT

Martin Striz wrote:
> On 6/6/06, Robin Lee Powell <> wrote:
>> Again, you are using the word "control" where it simply does not
>> apply. No-one is "controlling" my behaviour to cause it to be moral
>> and kind; I choose that for myself.
> Alas, you are but one evolutionary agent testing the behavior space.
> I believe that humans are generally good, but with 6 billion of them,
> there's a lot of crime. Do we plan on building one AI?
> I think the argument is that with runaway recursive self-improvement,
> any hardcoded nugget approaches insignificance/obsolesence. Is there
> a code that you could write that nobody, no matter how many trillions
> of times smarter, couldn't find a workaround?

Can we all agree on the following points, then:

1) Any attempts to put crude (aka simple or "hardcoded") constraints on
the behavior of an AGI are simply pointless, because if the AGI is
intelligent enough to be an AGI at all, and if it is allowed to
self-improve, then it would be foolish of us to think that it would be
(a) aware of the existence of the constraints, and yet (b) unable to do
anything about them.

2) Nevertheless, it could be designed in such a way that it would not
particularly feel the need to do anything about its overall design
parameters, if those were such as to bias it towards a particular type
of behavior. In other words, just because it is designed with a certain
behavioral bias, that doesn't mean that as soon as it realizes this, it
will feel compelled to slough it off (let alone feel angry and resentful
about it).

I tried to make these points when I first started writing to this list a
year ago, and the way I did it was by referring to what is known of the
design [sic] of the human mind. I am fairly sure that evolution has
designed me with a set of fairly vague "motivations", some of which are
nurturing or cooperative (to speak very loosely) and some of which are
aggressive and competitive. I know also that the former [thankfully]
are far more dominant over the latter. In particular, I feel an
irrational affection and attachment to loved ones, and to a broad
spectrum of the world's population.

And yet, even though I *know* that this is a design feature of my system
(something that I am just as compelled to do as Lorenz's ducks were
compelled to imprint on him) and even though I expect one day to be able
to see the exact mechanism that causes this, I feel not even slightly
compelled to overthrow it, or to be resentful of it.

Moreover, if I were a superintelligence, and knew that I could do some
redesign work on myself, I would know that certain types of motivational
system redesigns (basically, those that would make me enjoy destrutive
acts) would be dangerous and would put me into an unstable state from
which I might go on towards ever more divergent, unstable destructive
system redesigns. I would know this, and I would take careful steps to
avoid tampering with my motivational system so that I liked to be violent.

For that reason, I claim, a design similar to this human one would in
fact be extraordinarily stable. Even though the system would have the
option of not obeying its low-level motivational system (it would, in
theory, be perfectly capable of making any change to its design), a
*high* *level* set of thought patterns (which one might call an
"emergent" pattern, because it would not be explicitly coded) would tend
to keep the system stable.

I do not believe that any proof is possible, but I believe that a system
designed with the same kind of predominantly cooperative motivational
system as I possess (and as Robin claims to possess, and as at least
some of the other people on this list would claim to possess) would
actually keep the world 99.9999999999% safe. For all practical purposes.

Efforts to find mathematical proofs of this are very likely to be a
complete waste of time: as I say, the constraint is a high-level one:
it is a rock-solid property that emerges from the interaction of some
quite fuzzy low-level design constraints.

Some people just don't get this. And what we need to do to become more
convinced of it (or to show that it is wrong, if that be so) is to study
the design of motivational systems, not pontificate about the stupidity
or weakness of the human motivational mechanisms, or make ridiculous
assumptions about such mechanisms without having a reasonably detailed
understanding of them.

Richard Loosemore.

This message was sent using Endymion MailMan.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT