Re: ITSSIM (was Some new ideas on Friendly AI)

From: David Hart (
Date: Tue Feb 22 2005 - 15:57:40 MST

Ben Goertzel wrote:

> Your last paragraph indicates an obvious philosophical (not logical)
> weakness of the ITSSIM approach as presented.
> It is oriented toward protecting against danger from the AI itself,
> rather than other dangers. Thus, suppose
> -- there's a threat that has a 90% chance of destroying ALL OF THE
> UNIVERSE with a different universe, except for the AI itself; but will
> almost certainly leave the AI intact
> -- the AI could avert this attack but in doing so it would make itself
> slightly less safe (slightly less likely to obey the ITSSIM safety rule)
> Then following the ITSSIM rule, the AI will let the rest of the world
> get destroyed, because there is no action that it can take without
> decreasing its amount of safety.
> Unfortunately, I can't think of any clean way to get around this
> problem -- yet. Can you?

For the sake of a somewhat simple and concrete example, lets assume that
ITSSIM is treated as a supergoal, and that the AGI system design in
question must [re]evaluate each potential action against ALL supergoals
before executing it (where a supergoal is defined arbitrarily by some
heuristic as, say, the largest N super-nodes in a scale-free graph of
goals, where N might start off as 3).

Other obvious pre-wired goals might be CV, and an "external existential
risk evaluator" or EERE.

Perhaps variations of EERE are needed for 'risk to self' and 'risk to
environment' (where 'environment' includes humans and all other natural
and artificial life), or perhaps EERE can be all-inclusive.

Or, perhaps ITSSIM would in fact serve the purpose of EERE if ~A also
means failing to take a pro-active action that would avert an external
risk, meaning that A, although it might be very dangerous, is deemed
better than ~A. Such a design would need to actively seek out possible
EERs and generate potential actions designed to mitigate them.

The "multiple-SG as action generator AND governor" architecture begs the
question of whether a 'arbiter' SG is needed, or whether SGs would
naturally feedback-harmonize given, e.g., that one SG may be churning
out potential actions, and another SG may be vetoing them.

An interesting design/tuning question also follows: should a SG lose
long-term-importance for casting a veto, or for generating a potential
action that is vetoed, or for both? We wouldn't want an AGI 'going
blind' because potential actions generated to avert a very real and
perhaps overwhelming EER are constantly vetoed as 'too unsafe', and
likewise we wouldn't want an AGI that's overly trigger-happy.

Designing a single SG like CV with long-term stability is difficult
enough; designing (or perhaps 'seeding' is a better term) a complex
system of supergoals with long-term stability is even more difficult,
however, I suspect that it's closer to the true nature of the problem.


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:50 MDT