Re: Flight recorders in AIs

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Thu May 22 2003 - 02:42:16 MDT


Tommeteor@aol.com wrote:
> I've been reading Creating Friendly AI, and it talks about the concept
> of a "flight recorder," to monitor the AI's actions and goals. It said
> that "obviously," the flight recorder needed to be made inaccessible to
> the AI if at all possible. Yes, that is the/ obvious/ answer, but
> "obvious" and "correct" are not always synonymous. After all, it was
> once "obvious" that the Sun went around the Earth! The concept of
> "preventing tampering with the flight recorder" would probably be
> necessary with an AI too young to understand FoF and the concept of a
> "flight recorder" fully. However, as the AI matures, if FoF is a genuine
> possibility, the AI should agree with the programmers about the need for
> safeguards, thus eliminating the need for "anti-tampering" devices. If
> an unFAI tried to modify the flight recorder for unFriendly purposes, it
> must already have an unFriendly parent goal to give desirability to the
> subgoal of "modifying the flight recorder", and this parent goal will
> have been noticed by the "flight recorder" before it is actually
> modified. A sufficently advanced "flight recorder" would be intelligent
> enough to alert the humans whenever there is an unFriendly goal in the
> AI, or, at least, an unFriendly goal that could cause damage. Thus, the
> only data that would need to be protected is the data that would store
> the memory of an unFriendly goal, and even that "guard the data" concept
> could be worked around, given a year or so to figure it out.

I think you missed the point of a flight recorder. The idea is that even
if you don't notice a failure *instantly*, there's at least the
*potential* to notice the failure five years later - so long as the AI
hasn't overwritten the evidence meanwhile. When the goal of hiding a
catastrophic failure first forms, and before it is implemented, the goal
itself should be noticeable. It may only be noticeable for 500
microseconds before it's obscured, but even so, formation of the desire
and implementing it should not be simultaneous. If you have the ability
to run an exact frame-by-frame reenactment of the AI's history, you can
use advanced tools built offsite, or additional programmers, to notice
that 500-microsecond unobscured failure. Five years later, if necessary.
   There would at least be a chance, so long as the evidence is not erased
in the meanwhile. That's why the AI can't have access to the flight recorder.

Trust, but verify. Having the AI agree (at time T) with the programmers
about the necessity of safeguards because of the possibility of
catastrophic failure (at time T+1) does not mean that anti-tampering
devices would not be necessary in the case of catastrophic failure (at
time T+1).

The flight recorder is not a magical device that automatically notices any
humanly conceivable failure of Friendliness before it can do any harm.
The function of the flight recorder is not to solve that problem but to
smear that problem across time, so that even if you, as a human, fail to
notice the failure at time T+1, with the monitoring technology of time
T+1, you can potentially still notice that failure at time T+10 with the
monitoring technology of time T+10. Rather than the young UFAI needing to
conceal its failure only from inexperienced programmers, and the mature
UFAI being old enough to conceal its failure from experienced programmers,
the young UFAI must immediately outwit all experienced future programmers
that will ever go over that particular moment of its youth.

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT