RE: ITSSIM (was Some new ideas on Friendly AI)

From: Ben Goertzel (
Date: Tue Feb 22 2005 - 16:53:11 MST


ITSSIM would cause an AI to avert only existential risks that either

-- it could avert without compromising its principle of safe action, or

-- it judged would be likely to mangle ITSELF, therefore potentially
threatening its ability to act safely

I don't think that ITSSIM is an ideal approach by any means, in fact it's
over-conservative for my taste. I'd rather have an AI be able to deviate
from its safe-self-modification rule in order to save the universe from
other threats; the question is how to allow this without opening the door to
all sorts of distortions and delusions...

-- Ben
  -----Original Message-----
  From: []On Behalf Of David Hart
  Sent: Tuesday, February 22, 2005 5:58 PM
  Subject: Re: ITSSIM (was Some new ideas on Friendly AI)

  Ben Goertzel wrote:
    Your last paragraph indicates an obvious philosophical (not logical)
weakness of the ITSSIM approach as presented.

    It is oriented toward protecting against danger from the AI itself,
rather than other dangers. Thus, suppose

    -- there's a threat that has a 90% chance of destroying ALL OF THE
UNIVERSE with a different universe, except for the AI itself; but will
almost certainly leave the AI intact
    -- the AI could avert this attack but in doing so it would make itself
slightly less safe (slightly less likely to obey the ITSSIM safety rule)

    Then following the ITSSIM rule, the AI will let the rest of the world
get destroyed, because there is no action that it can take without
decreasing its amount of safety.

    Unfortunately, I can't think of any clean way to get around this
problem -- yet. Can you?

  For the sake of a somewhat simple and concrete example, lets assume that
ITSSIM is treated as a supergoal, and that the AGI system design in question
must [re]evaluate each potential action against ALL supergoals before
executing it (where a supergoal is defined arbitrarily by some heuristic as,
say, the largest N super-nodes in a scale-free graph of goals, where N might
start off as 3).

  Other obvious pre-wired goals might be CV, and an "external existential
risk evaluator" or EERE.

  Perhaps variations of EERE are needed for 'risk to self' and 'risk to
environment' (where 'environment' includes humans and all other natural and
artificial life), or perhaps EERE can be all-inclusive.

  Or, perhaps ITSSIM would in fact serve the purpose of EERE if ~A also
means failing to take a pro-active action that would avert an external risk,
meaning that A, although it might be very dangerous, is deemed better than
~A. Such a design would need to actively seek out possible EERs and generate
potential actions designed to mitigate them.

  The "multiple-SG as action generator AND governor" architecture begs the
question of whether a 'arbiter' SG is needed, or whether SGs would naturally
feedback-harmonize given, e.g., that one SG may be churning out potential
actions, and another SG may be vetoing them.

  An interesting design/tuning question also follows: should a SG lose
long-term-importance for casting a veto, or for generating a potential
action that is vetoed, or for both? We wouldn't want an AGI 'going blind'
because potential actions generated to avert a very real and perhaps
overwhelming EER are constantly vetoed as 'too unsafe', and likewise we
wouldn't want an AGI that's overly trigger-happy.

  Designing a single SG like CV with long-term stability is difficult
enough; designing (or perhaps 'seeding' is a better term) a complex system
of supergoals with long-term stability is even more difficult, however, I
suspect that it's closer to the true nature of the problem.


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:50 MDT