Re: AI Jailer.

From: Moshe Looks (moshel@cs.huji.ac.il)
Date: Sun Jul 07 2002 - 11:50:46 MDT


Mike & Donna Deering wrote:
> The Friendly AI can communicate with the programmer
> and try to convince him that it is Friendly. The Unfriendly AI can
> communicate with the programmer and try to convince him it is
> Friendly.
>
> What limitations? Any argument available to the Friendly AI is also
> available to the Unfriendly AI. Therefore the programmer has no way
> of determining the status of the AI.

Not quite. The critical difference between Friendly and Unfriendly AI is
history: how the AI got to be what it is. Thus, the only way to break
symmetry between FAI and UFAI is by asking questions about the AI's
design and development (assuming that there exist accurate unmodified
records, otherwise forget it!).

How about asking the AI to produce a written document justifying its own
Friendliness design and safeguards as adequate? This paper could then be
sent around to lots of people who had no direct contact whatsoever with
the AI, who would try and poke holes in it. Unlike humans, an AI could
be expected to be capable of PROVING its own Friendliness. Yes, it's
still _possible_ that a UFAI could slip some magically compelling false
argument into the paper, but I think that having a lengthy review by
many people could eliminate this possibility. Plus, spotting misleading
arguments in written documents is something that humans have lots of
experience with: if human language and math allow for "magic arguments
that always convince everyone", I think we would have found them by now ;-).

Moshe



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:40 MDT