RE: Volitional Morality and Action Judgement

From: Michael Wilson (
Date: Tue May 25 2004 - 02:54:47 MDT

Ben Goertzel wrote:
> However, I do think it's possible that you have a theory of the
> "probabilistic attractor structures" that self-modifying cognitive
> systems are likely to fall into.

This is the wrong type of thinking. We're not trying to guess what will
happen in an unconstrained system given certain inputs, because this is
essentially impossible for anything even approaching the complexity of
seed AI. Computers are deterministic systems and making probabilistic
guesses about things that have never been done is usually indicative of
an attempt to generalise without a sound basis for assessing the
confidence of that generalisation.

The correct mode of thinking is to constrain the behaviour of the system
so that it is theoretically impossible for it to leave the class of states
that you define as desireable. This is still hideously difficult, as
defining 'classes of states' is really hard and you need multiple
overlapping safeguards to compensate for likely implementation errors,
but it is at least not doomed from the word go.

> My statement is that, so far as I know, it's reasonably likely that
> building a decently-designed AGI and teaching it to be nice will
> lead to FAI.

Without a deep understanding of the cognitive architecture, you have no
way of knowing whether you are 'teaching' the system what you think you
are teaching it. If you /do/ have a deep understanding of the architecture,
then you don't teach, you specify (though you if you don't understand what
you're trying to teach you might have to specify the consequences and get
the AGI to work out what the axioms are). From what I understand of your
design, you will be trying to form a distributed semi-causal goal system
via subgoaling of reinforcer-based supergoals. The result would be zero,
wrong and/or incomprehensible generalisation of the subgoals followed by
an indeterminate self-modifcation trajectory possibly resulting in subgoal
stomp and arbitrary behaviour or possibly just tiling the solar system
with smiley faces. I would say we might get lucky and it would just wirehead,
but I get the impression you'd just hack the structure a bit and rerun the
takeoff until planetkill occurs.

Social training is an empathic process that works on other humans (and to
a lesser extent, other vertebrates) because of shared brainware. You can
select actions which probalistic but mainly favourable consequences because
you can model the other person. Neither this nor experience of training
trivial connectionist networks or setting fitness functions in tiny GAs is
applicable to the problem of teaching an AGI morality.

> I wouldn't advocate proceeding to create a superhuman-level self-modifying
> AGI without a better understanding.

Commendable, but are you sure that you have enough understanding not to do
it by accident?

 * Michael Wilson


Yahoo! Messenger - Communicate instantly..."Ping"
your friends today! Download Messenger Now

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT