Re: AI Jailer.

From: James Higgins (
Date: Tue Jul 16 2002 - 02:14:08 MDT

Moshe Looks wrote:
> Mike & Donna Deering wrote:
> > The Friendly AI can communicate with the programmer
> > and try to convince him that it is Friendly. The Unfriendly AI can
> > communicate with the programmer and try to convince him it is
> > Friendly.
> >
> > What limitations? Any argument available to the Friendly AI is also
> > available to the Unfriendly AI. Therefore the programmer has no way
> > of determining the status of the AI.
> Not quite. The critical difference between Friendly and Unfriendly AI is
> history: how the AI got to be what it is. Thus, the only way to break
> symmetry between FAI and UFAI is by asking questions about the AI's
> design and development (assuming that there exist accurate unmodified
> records, otherwise forget it!).
> How about asking the AI to produce a written document justifying its own
> Friendliness design and safeguards as adequate? This paper could then be
> sent around to lots of people who had no direct contact whatsoever with
> the AI, who would try and poke holes in it. Unlike humans, an AI could

This would be interesting, and maybe even helpful, but would prove
nothing. Even an AI as stupid as I am could trick people using this
method. First, the AI finds an extremely subtle method to accomplish
its goals. Re-engineer itself as necessary to be able to exploit this.
  Then it writes an extremely thurough paper, even imposing safeguards
that are not currently in place. Even better, suggest some that would
*appear* to prevent the method the AI intends to use. If done properly
it is unlikely anyone would miss this tiny crack in the armor.

James HIggins

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:40 MDT