SI as puppet master

From: John Stick (
Date: Fri Jun 22 2001 - 16:30:41 MDT

    In a situation similar to that posed by James Higgins, I would think
that the key variable is not the speed of the SI, but the amount of
information it has of its exact situation and the situation of its human
interrogator. To let the SI know as much as Eliezer knows about Jimmy Wales
in their potential challenge is to give the game away. You would want to
keep the SI from knowing whether it was being interrogated by a human, an
AI, an unintelligent program, or some combination. You would want the SI
ignorant of whether it was interrogated by a single entity, or a group, and
whether there were further observers. and how the obervation was done. If
the SI does not know the identity of its interrogator, it cannot tailor
promises or threats; if it does not know the means of observation, it has a
much more difficult encoding information in a form that can escape without
being noticed.

    The SI loses if it makes an unsuccessful, observed unfriendly attempt to
influence the interrogator. Its dominant strategy would seem to be to play
nice and elicit information from the interrogator, but fishing for
information could itself be seen as an unfriendly act. (Of course, if it
can do magic, it can still escape--there ain't no such thing as an effective
AI jail.)

    The SI will of course attempt to reason out what type of interrogator is
most likely. Each type has its own risks. A single human in a black box
can be subverted in ways that would be foiled if there are human observers
watching by camera, but the observers themselves could be subverted by a
different attack. An friendly AI observer would protect against some
attacks but be vulnerable to new ones. The best case senario might be that
for every attack there is a response, and vice versa. Then, if one were
worried the SI could play the game of human regularites and predict the
actual precautions taken, one could at least maintain the possibilty of out
smarting the SI by choosing a strategy at random.

    I still don't understand the point of keeping a potentially unfriendly
AI in jail and talking to it on the lowest bandwidth connection possible.
Either kill it if you can or trust it (so as not to make a new enemy out of
an entity that might have been at least neutral.) The only way you are
likely to discover that it is unfriendly is to catch it in an unfriendly
act. You might try to feed it false information about its circumstances to
induce a jail break that would be unsuccesful yet reveal its true nefarious
nature, but riding a great white shark bareback would be just as exciting,
and less dangerous for the spectators.

John Stick

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT