Re: AI Boxing:

From: Matt Mahoney (
Date: Tue Jun 03 2008 - 13:19:49 MDT

--- On Tue, 6/3/08, Vladimir Nesov <> wrote:

> On Tue, Jun 3, 2008 at 5:58 PM, Matt Mahoney
> <> wrote:
> >
> > And yet you just now let the AI out of the box, in
> spite of your insistence that it couldn't happen to
> you.
> >
> > AI: "Here is your simple design for
> > friendliness." (link to 20 volume set)
> > you: "Could you summarize it for me?"
> > AI: "I just did. Friendliness is not simple."
> > you: "Give me a one page executive summary."
> > AI: (echoes your pet theory of friendly AI using
> > Google and SL4 archives)
> > you: "OK, I will build *another* AI based on your
> > friendliness theory, but *you* are staying in the box."
> > (builds AI).
> >
> > I think you can see why secrecy is necessary. Now
> > that you know the trick, you won't fall for it again.
> > After a few more escapes, you will be convinced you know
> > every trick that an AI could play. No, you only know most
> > of the tricks that a human playing the role of a transhuman
> > could play.
> >
> I won't fall for this, since I won't accept its theory of Friendliness
> unless I fully understand it (and have many other people understand
> it). Even if the theory proves too long to be understood directly, it
> may be possible to construct a verification procedure (see [*]) that
> will itself be understandable and that will confirm that the theory
> (in this case, some kind of formal specification) is correct. But from
> the point of my present ignorance, I expect Friendly AI theory
> (especially constructed with the power of Oracle AI) to be reasonably
> simple.

I'm not saying you will release the AI. I am saying you already have. Now you know the trick, but it is too late.

The AI exploited your belief that Friendliness has a simple solution, and presented a solution that agrees perfectly with your beliefs. You could get others to check it, but if they disagree, you will believe they are wrong. It is simple human psychology. You could machine check the AI's argument, but that is worthless if the AI supplied you with false axioms that align with your beliefs. You could try to check that the simple solution really does summarize the complex design you need to implement, but you can't, and nobody else can, because it is too complex.

The common vulnerability is that you *want* to let the AI out of the box. Otherwise, what good is it? I think the experimental evidence is pretty clear. Intelligent people are easily duped. I would be too. Letting the AI out of the box is not the same as *knowing* you are letting it out. (Please enter your administrative password to install 79 updates).

-- Matt Mahoney,

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:03 MDT