Re: SI Jail

From: Durant Schoon (
Date: Mon Jun 25 2001 - 17:46:47 MDT

> From: Aaron McBride <>
> Subject: SI Jail
> (Please forgive me if you get this twice. I've been having some email trouble)
> What about putting the SI in a prisoners dilemma type situation?

I like the concept, but this raises bigger issues. First is, which question
we are asking?

A) Given a Super Intelligence who is known to be UnFriendly, could a
        suspicious human *not* be taken over through a VT100 terminal
        and tricked into relasing the SI? (Can we safely jail ver?)

B) Given an SI of *unknown* Friendliness, would we be able to test it
        satisfactorily to determine if it is Friendly, so that we might
        release it? (Can we verify ve is Friendly?)

C) Given an aFriendly* SI seed, could we modify it so that we are convinced
        it is Friendly before releasing it?

I don't think we can actually answer (A) without any of us having any
experience with an SI (ie. without an SI we cannot know its magic, so
we cannot say whether it could take over a human or not). I don't know
how we could be 100% convinced it would be safe. I'd personally
classify this as Unanswerable (but with some clever and interesting
possible safety measures mentioned on this mailing list). We just can't
know the answer, yet. Sorry.

For (B), we might try Aaron's suggestion of using the SI's to rat each
other off. But with his case 3 (UnFriendly and untruthful), we see that
this can't really help us, especially when they are so much smarter than
we are (by definition).

Someone tell me if this is true:

"A black box test of Friendliness can never satisfy us that an SI is
truly Friendly".

That is, we feed the SI an input, it gives us an output and we check
that the output conforms to Friendliness. We cannot determine if the
SI is truly Friendly because this problem is similar to the Halting
Problem**. We need source code level verification, verified by a *known*
Friendly AI when the complexity scales beyond what humans can determine.
(So if Eli builds his own, he would design ver to check verself as soon
as he could not.)

Also with question (B), we'd likely find ourselves in the position
that Carl Feynman proposed, in which the AI would give us such
bounteous intellectual gifts that people would clamor for ver release.

For (C): Suppose BioMind is wildly successful. Cognitive Bioscience
is about to release the BioMind ScreenSaver(tm) which will turn
everyone's computer into a buzzing, humming node on the internet
and possibly be seeding the Global Brain.

BioMind understands proteins and has enhanced data mining tools which
it is starting to apply to other fields (like analyzing human behavior
as documented on the www in plain natural human language).

Could they retrofit it to be Friendly? What if the task turns out to
be easier if they rewrite from scratch? Should they wait? Should they
release it anyway? What should they do? (I'll leave this an open might just be a software engineering problem...I dunno).

* aFriendly: neutral, neither Friendly nor UnFriendly

** We might test it 1,000,000 times, convince ourselves the SI is
Friendly. Once we release the SI though and ve passes the same million
real world tests, we are not guaranteed that the SI won't be UnFriendly
on the millon + 1st time. The results of Wisdom Tournaments can't convince
us entirely (forgive me if this is convered, I'm still making my way
through CFAI and haven't even gotten to the Wisdom Tournament section).
The wisdom tournaments will be useful for training, of course, but being
*convinced* probably relies on knowing fundamentals of the entire
architecture and dynamics of the system.

Durant Schoon

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT