Re: Effective(?) AI Jail

From: Eliezer S. Yudkowsky (
Date: Mon Jun 18 2001 - 05:07:23 MDT

Jimmy Wales wrote:
> Aaron McBride wrote:
> > Speaking of incrementally... would we really be trying to communicate with
> > an SI over a VT100?
> I think that the idea of the VT100 is that we may want to communicate via a
> medium that has the minimal bandwidth still permitting useful communication.
> To be even safer, maybe we should only let the SI talk to us in a very simple
> and slow binary code. 1 means yes, 0 means no. I go into the box, I ask a
> question, and light A or light B turns on as an answer.
> It'd be pretty hard to tell me a story that makes me cry with something like
> that. It'd be pretty hard to teach me a pro-SI religious fanaticism through
> an interface like that.

Now, *this* is the *only* SI-in-a-box scenario I've ever heard that has a
remote chance of working for, say, thirty reasonably innocuous questions.
(Anything beyond that and you're entering into the point where the SI can
manipulate the deduced phase space of the external universe into arbitrary
states through the application of sufficiently precise stimuli.) You also
run into the "quantum cheat codes" problem, in which the SI just uses
magic and vanishes - modern tech looks like magic to a Neanderthal, let
alone a dog. Maybe entering a sufficiently precise internal state can
create the conditions that make likely, e.g., involvement in a larger
closed timelike curve. Maybe the SI can make itself "sufficiently
interesting" to a deduced set of unknown-to-us observers. You just don't
know, and I'm asking the wrong questions.

But, if the laws of physics work the way we think we do; and the SI is
developed on layers of emulation multiply removed from the bottom
hardware; and the bottom hardware is relatively unsophisticated (no
quantum computers, no FPGA cellular automata); and the whole room is
sealed in multiple layers of insulation to try and prevent any
hardware-manipulation information from leaking into external reality; and
if the room is destroyed by a "hard timer", nonresettable by available
resources, consisting of a tactical nuke; and if there are no backup
copies; and if all thirty questions are selected in advance; and if nobody
looks at the answers before the SI is destroyed; then you might succeed in
getting a set of completely useless or malevolent answers from an SI that
hates you, or a set of useful answers from a Friendly AI that you would
have then wrongly destroyed. There would still be an irreduceable
probability of at least 40% that you would be just screwed by something
you didn't understand and couldn't anticipate. Because let's face it, I
left *something* off that list. How many of those precautions would a
Neanderthal have taken, I wonder?

Jimmy Wales, you have insufficient fear. Your basic reaction seems to be
"Sure, I can use this unFriendly superintelligence," not to run screaming
in the opposite direction until you die of exhaustion. Your pistol idea
would make things more difficult for me, but I doubt an SI would even
blink. It simply imposes the already-obvious condition that the first
phases of takeover should not cause alarm or alertness in the subject.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT