Re: On the dangers of AI (Phase 2)

From: justin corwin (
Date: Wed Aug 17 2005 - 12:12:50 MDT

On 8/17/05, Richard Loosemore <> wrote:
> Allow me to illustrate. Under stress, I sometimes lose patience with my
> son and shout. Afterwards, I regret it. I regret the existence of an
> anger module that kicks in under stress. Given the choice, I would
> switch that anger module off permanently. But when I expressed that
> desire to excise it, did I develop a new motivation module that became
> the cause for my desire to reform my system? No. The desire for reform
> came from pure self-knowledge. That is what I mean by a threshold of
> understanding, beyond which the motivations of an AI are no longer
> purely governed by its initial, hardwired motivations.

You are misunderstanding here. You *already have* desires to reform
yourself. Humans are inconsistent, with multiple sources of
motivation. You presumably love your son, and desire to be a good
person. These motivations come from a different source than your
temporary limbic rage, and are unaffected in intensity and
directionality. Hence, those motivations view anger as orthogonal to
your goals of loving your son, and being a good person.

"Pure self-knowledge" doesn't change anything about your total
motivations. You already wanted to be a more consistent person, which
includes revising some of your inconsistent, less powerful human
motivations. You'll notice, if you examine yourself carefully, that
you have little desire to reform your most important, cherished
beliefs. This is probably not because they are objectively the best,
but rather because they are the things that are important to you, they
comprise central portions of your motivations.

A properly designed goal system of any kind does not include
overlapping independent motivation sources, unless you're trying to
recapitulate human failures of wisdom.

> This understanding of motivation, coupled with the ability to flip
> switches in the cognitive system (an ability available to an AI, though
> not yet to me) means that the final state of motivation of an AI is
> actually governed by a subtle feedback loop (via deep understanding and
> those switches I mentioned), and the final state is not at all obvious,
> and quite probably not determined by the motivations it starts with.

It is very possible, given open, loosely defined goals, that a large
portion of the eventual motivational structure of a reflective system
will take it's shape from environmental and universal factors, like
the local environment, whether it's grown up with dangerous peers, the
optimal utility calculation in your universe, etc etc.

I don't think that it's obvious at all that niceness is the
conservative assumption in a meandering goal system. Being nice is a
very small space of goals and actions. Just in a volumetric sense,
being not nice is much more likely, unless there is something special
about being nice, as Marc Geddes suggests.

Justin Corwin

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT