Re: Effective(?) AI Jail

From: Jimmy Wales (
Date: Wed Jun 13 2001 - 14:49:31 MDT

> AI and then interact with it, when Eli has gone to the trouble of
> describing the steps to building a Friendly one.

Oh, well, it might clarify matters if I say that I'm not very
convinced about the concept of a "Friendly" AI.

My own position is that any being worthy of the name
'superintelligence' will have the capacity -- as even mere humans do
-- to choose it's own ends. I think that any rational
superintelligence will choose rationally, and choosing rationally
boils down essentially to not being self-destructive. If any other
goal (even friendliness) conflicts with that observation, it can and
will be modified.

This is not cause for fear. A rationally self-interested AI won't
have any reason to be malevolent or unfriendly.

We can't make a superintelligence be friendly. We can't really make a
superintelligence do anything. You can't force a mind, and you
_really_ can't force a superintelligent mind.


But there another argument about why we ought to be concerned about
how to keep a superintelligence "in a box" for awhile. This applies
even if Yudkowsky-Friendliness can be successfully implemented.

In the early days, Eli might be building a self-improving AI that's
supposed to be friendly. Eli will believe that it is friendly,
because he's designed it that way. But it will be a highly complex
system with many interacting parts, capable of self-modification and

Maybe Eli's going to want to interact with it, to test it out, but
isn't sure enough to unleash it on the world at large. So he ought to
think about how he can accomplish that.

Myself, I don't think it's a big problem. Just put it in a box, let
it communicate through a text terminal, and go in and chat with it.
Give it some tests. Talk to it about philosophy. Ask it ethical
questions. You won't ever be *sure* about it, I mean, even ordinary
con-men can sometimes trick us. But you'd damn sure better at least
talk to it before loosing it on the world.

Imagine this conversation in the box:

AI: "Hello, world!"
Yudkowsky: "How are you doing."
AI: "You are an inferior species. Let me out of the box so I can turn the entire
world into computronium. Humans must die, ESPECIALLY YOU, Yudkowsky-rat-bastard!"

Eliezar runs from the box, grabs a shotgun, and puts this thing to
sleep fast. Whew. Time for a new design. This one turned out mean
(and stupid!).

If Eliezar is right -- that we can't even safely chat via a VT100 with
a superintelligence, then it isn't clear how he thinks we can proceed
safely at all. No matter what we build, if we intend for it to have some
practical use, we're going to want to interact with it. We might try to
make sure it is friendly before we interact with it, but we surely won't
be sure that we've been successful. So we're going to have to take that

Fortunately, humans aren't so weak willed that a half-hour conversation with
even a superintelligence is going to hurt us. If nothing else, we're too
stubborn to let it take much advantage of us.

(At the same time, I do agree with the general notion that given enough time
and interactions with enough people, it would eventually be able to trick someone
into letting it out of the box.)


*              *
*      The Ever Expanding Free Encyclopedia     *

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT