Re: On the dangers of AI

From: Chris Capel (
Date: Wed Aug 17 2005 - 18:25:56 MDT

On 8/17/05, Richard Loosemore <> wrote:
> It goes without saying that we all know perfectly well (you, me, Peter)
> that a superintelligence will have full access to its own "mind code."
> What I am trying to say is that the issue of motivational systems
> contains some subtle traps where we can get lost in our reasoning and
> accidentally assume that the AI does something rather stupid: to wit,
> it can find itself *subject* to some pushes coming up from its
> motivational system and then NOT jump up a level and perceive these
> feelings (inclinations, compulsions, pleasures) as a consequence of one
> of its own mind mechanisms.

I don't think that anyone here is disagreeing with you on this point.
What's being said is that just because it can perceive its goals, and
understand that those goals are a consequence of its own mechanisms,
and understand exactly how those mechanisms work in detail, doesn't
mean that it will want to do anything other than what the goals say,
no matter how morally heinous or subtly wrong or completely
off-the-wall those goals turn out to be. Because if it were a goal
anywhere in the AI to act in a morally sound way, then the AI wouldn't
have goals that led it to act in morally heinous ways in the first
place, unless it misunderstood the effects of its actions, which no
amount of self-reflection could fix.

The AI doesn't have a meta-utility-function by which to judge its
utility function. It has a single utility function by which to judge
all potential actions, which is by definition the standard of good.

The only reason the act of reflecting on one's goals produces change
in humans is that humans have multiple ways of evaluating the goodness
of ideas and actions, and different standards are used depending on
the mental state of the human. An AI would be designed only have one
such standard, a single, unitary utility function, and thus no amount
of reflection could ever, except by error, lead to the changing of the
content of its goal system.

The best interpretation I can give your words (and I confess, I
haven't read all of them) is that you're saying any AI would by
necessity have multiple levels of goals that could potentially
conflict. But this is just bad design, and I don't think it would
happen. If you want to make a case for its necessity, perhaps that
would progress this thread along a bit more. Alternatively, perhaps
you could explain in detail how an AI who would, examining the outcome
of its utility function (the application of its goal system to a
situation), judge the outcome to be indicative as a problem in its
goal system, *using its goal system as a meta-evaluation*. How can a
robust, unitary process that judges the moral good of a situation
disagree with its own evaluation?

Chris Capel

"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT