From: David Hart (email@example.com)
Date: Wed Feb 23 2005 - 01:08:43 MST
Ben Goertzel wrote:
> Maybe one could build in a "grandfather clause" stating that an AI can
> optionally violate the safety rule IF it can prove that, when given
> the same data about the world and a very long time to study it, its
> seed AI ancestor would have decided to violate the safety rule.
I like this idea; it seems that it would need to be a continual process
if the AI is to have an acceptable reaction time to existential risks.
A grandfather process could be part of a general set of conditions that
allows for [partial] override of safety rules in cases of extreme
existential danger, averting the case where ITSSIM allows the AI's
environment (our Universe) to be destroyed by a 'safe' failure to act
(isn't this similar to the classic Asmovian robot deadlock?).
E.g. in the concept of an ensemble of supergoals, ITSSIM would be one of
many [weighted] supergoals, while an 'existential risk detector', with a
'grandfather clause' as a special case, could be another supergoal.
In a Novamente context, such an ensemble would be a [dynamic scale-free]
network of goal nodes, and could therefore be tuned for acceptable
margins of safety (we live with evolution-tuned SFNs all around us,
inside of us, etc.), e.g. a goal graph could be tuned for the emergence
of 1 mega-SG, or 10 equal sized SGs, or 3 large SGs each with 3 sub-SGs,
or 2 large SGs and one slightly smaller SG, etc.
The ITSSIM SG would probably govern the tuning parameters, as any
change, even a necessarily small and 'incremental' change, would
constitute a significant self-modifying action.
Any SG architecture could be validated, invalidated and/or optimized
using experimentation with different parameters in simulation. Such
experimentation might find, e.g, that 1 dominant-SG is dangerous
because of the >0 probability of the paperclip problem, or that 10 equal
sized SGs lead to system deadlock or always condense into fewer SGs on
their own, or that a system without a 'grandfather clause' embodied in a
SG would grow faster but might also fail to avert a large existential risk.
Lessons learned from simulation could lead to an adaptive SG
architecture, e.g. allowing a new temporary SG to emerge for a specific
existential risk mitigation (only if/when a large existential risk is
discovered), later reverting back to the 'safer' state (providing a
function similar to Elias's stack frame).
One can imagine that perhaps SGs would condense into a normal stable
state something like:
1. Joy, Growth, Choice + CV (boils down to growth moderated by
2. ITSSIM (safety governor)
3. existential risk detector + mitigation action generator (including
To tie this back to friendliness, in my current understanding,
Friendliness is equivalent to a maximally-stable complex system of SGs
and subgoals seeded with the best human values. In other words,
Friendliness is bound, in that it can exist only within a known complex
system, "The Universe", and therefore to be truly friendly and as
invariant as possible, should be a set of [asymptotically stable]
attractors, if such a thing can exist on such a scale, or our best shot
at a set of non-asymptotically stable attractors.
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:50 MDT