Re: Suggested AI-Box protocol & AI-Honeypots

From: Michael Warnock (
Date: Sat Jul 06 2002 - 17:20:58 MDT

7/6/2002 2:17:28 PM, "Eliezer S. Yudkowsky" <> wrote:

>Based on my two experiences so far, I've created a suggested protocol
>for future AI-Box tests. This should also answer the questions Higgins
>asked earlier.

This seems very complete. Most of my ideas for how the AI party has
been successful are not within the protocol. The only remaining notion
I think to be reasonable is that Eli is convincing the Gatekeeper party
that letting him out now increases the chances of real FAI by increasing
the thought and self-doubt surrounding AI-Boxes and Friendliness.
This too may be judged to be using a real-world Eli tactic such as a
secret bribe, which breaks the first of the AI protocols.

I tend to think that an AI-Box with a single perimeter and an on/off
switch would not do the job intended, but possibly influence the AI's
perception of our own friendliness negatively. On the other hand I
think it's important that we not run programs on any substrate capable
of unaided self-extension at the hardware level without extremely well
thought-out safe guards.
If a transhuman AI could exist purely as software on more conventional
hardware (barring quantum mechanisms in our own brains or extremely
high requirements), then it should be possible to maintain hardware parity
that is not under the influence of said software. This parity hardware
would allow inspection of any and all code. The usefullness of this,
as I see it, depends upon how algorythmic vs connectionist the program
is and its level of complexity in general. Even given the most readable
code, however, a transhuman AI, which has spent most of its existance
working with code, including its own, would likely be able to slip one
by even the most expert panel of code-reviewers.
So what else could be done along these lines? How about the
equivalent of a honey-pot. In a traditional *N*X honeypot a
system image is placed within a host environment to which it has
no access or even indication that it is there. All outside communication
is routed to the contained system image. Checks are made of the
integrity of the image to alert of the hoped-for intrusion. Much can
be learned about a particular cracker by tricking him into your
To produce the equivalent for a transhuman AI may prove
impossible if it is able to harness superior hardware technology
within its own perimeter. Also, if an UAI deduced the existance
of the honeypot it would likely pretend to respect the first
boundary so as to be let out or wait for an appropriate time to
blow through the first and attempt the second. At least I think there
is some possible use in this, though, if an UAI is let into what it
thinks is the base level of reality and makes its character known,
but is still contained.
Perhaps we are in a similar situation as humans. ;)

Michael Warnock
Wherever you go; &this;

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:40 MDT