Re: Friendliness not an Add-on

From: Philip Goetz (
Date: Sat Mar 04 2006 - 13:40:24 MST

On 3/4/06, Michael Roy Ames <> wrote:
> Philip,
> IRT "It seems to me everyone is making a mistake in thinking that checking
> for friendliness of a program is like checking whether the program
> halts."
> --- Actually, I don't think that this is a mistake. We are attempting to
> define Friendliness as a thing-that-can-be-verified - or at least exploring
> the idea as a possible way of maintaining goal stability through recursive
> self improvement.

Yes, and I am saying you should try to verify something about the
outputs of the AI, not something about its entire internal operations.

Consider program verification, which is a similar problem. Any
attempt to formally verify that everything a program does matches the
programmer's intentions ends up amounting to re-writing the program in
predicate logic. (If you really want to verify your program, just
write it in Prolog to begin with.) What people actually do is to
isolate blocks of code, and tag them with statements that say, "If
conditions X are met on entering this block, then conditions Y are met
on exiting this block." This is like what I was saying (isolate the
actions/outputs, and then test them according to your rules), and not
at all like proving whether a program halts. Any post (other than
this one :) that references the halting problem or Rice's theorem is
off the mark.

In laymen's terms, I'm talking about something more like the Three
Rules of Robotics. These are a post-processing system which takes
whatever it is the robot proposed to do, runs it by the Three Rules,
and rejects it if it fails them.

(I will laugh and laugh and then cry if nobody can come up with any
system for Friendliness better than the Three Rules. No challengers
so far.)

> IRT "I think there is absolutely no hope of being able to formally verify
> anything about the results of a proposed course of action in the world."
> --- As a bare statement, I would have to agree with you. However, it is not
> the verification of the results of a proposed course of action that we
> intend to verify. Rather it is the whether the *intended* results of the
> action are Friendly, and provably so based on the definition of Friendliness
> as reified in the AI system and its goals. Where the intended and actual
> results differ the AI has failed to accurately predict outcomes, and this is
> salient for learning.

Why ask about the *intended* results? The intended results are
1) Of less significance than the *actual* results.
2) Orders of magnitude more difficult to establish than the *actual* results.

- Phil

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT