Re: On the dangers of AI

From: Michael Wilson (
Date: Tue Aug 16 2005 - 22:11:33 MDT

Richard Loosemore wrote:
> First, note that human beings are pretty dodgy when it comes to their
> *beliefs* about their own motivations:

Very true.

> Most of us are very imperfect at it, but (and this is an important
> point) the more we try to study motivation objectively, and internally
> observe what happens inside ourselves, the better, I claim, we become.

I suspect you are relying on a very distorted sample. Transhumanists tend
to self-select their associates to be altruistic and intelligent, confuse
correlation for causation and make statements similar to the above
(Eliezer did exactly that when he first started working on seed AI, on
top of the more serious error of assuming that relationships between
variables in human cognition and behaviour most likely do not apply to
AIs). Several people have hypothesised on SL4 that the world is full of
highly intelligent sociopaths who assume a moral-seeming mask to better
exploit those around them. While I can't say that this hypothesis can
be objectively justified by the available evidence, it is certainly not

> So far, so good. To be old-fashioned about it, Superego clobbers Id
> when it gets out of control, and we end up becoming "a genuinely good
> person."

To expand on your point (and temporarily ignoring the dubiousness of
Freudian psychology) an AGI is not required to have any of these
components. A rational causally clean AGI will not have an 'id', an
AGI that does not include self-referencing goals will not have an
'ego' as we understand it and an AGI with no moral complexity valuing
the wellbeing of volitional sentients will not have a 'superego'. In
principle an AGI could be designed to have these things, but that
would be neither easy (in relation to the already extreme difficulty
of producing a predictable AGI in the first place) nor wise.
> Hence, it depends on what was goal number one in the initial design of
> the system (altruism, rather than ruthless dominance or paperclipization
> of the known universe)... I mean are you sure that the reason why this
> complicated (in fact probably Complex) motivation system, which is more
> than just an opponent-process module involving Supergo and Id, but is,
> as I argued above, a tangled mess of forces, is ending up the way it
> does by the time it matures, *only* because the altruism was goal number
> one in the initial design?

'Tangled mess of forces' applies only to humans and really badly designed
infrahuman AGIs. A tangled mess is inherently inefficient at doing
anything. Following serious self-optimisation an AGI might have a complex,
even humanly incomprehensible, utility or decision function. But it will not
contain pointless redundancies or reflective fallacies, and will most
probably minimise or eliminate preference intransitivity and the sort of
vague intuitive-emotional (i.e. non-reflective, non-deliberative) thinking
that humans so often fall back on for moral judgements.

The problem isn't predicting how a tangled mess will execute the goal
system you design, it's predicting what your goal system will do when
executed to the letter by a (hypothetical) omniscient entity. Using an
opaque and causally messy AGI design will probably corrupt your goal
system before it settles into a rational form, but avoiding that problem
is considerably easier (but still very difficult) than the problem of
coming up with a goal system that does what we really want and is stable
under self-modification in the first place.

> I don't think you and I could decide the reasons for its eventually
> settling on good behaviour without some serious psychological studies and
> (preferably) some simulations of different kinds of motivation systems.

While fascinating, I leave the study of the more bizarre regions of the
cognitive architecture design space to post-Singularity researchers, who
will hopefully be much better placed to carry it out. Given the looming
existential risks, right now I only care about architectures that can be
used to reliably implement a Friendly seed AI (or 'RPOP' if you seriously
believe that the alternative term is less confusing).
> No, not at all! I am saying that a sufficiently smart mind would
> transcend the mere beliefs-about-goals stuff and realise that it is a
> system comprising two things: a motivational system whose structure
> determines what gives it pleasure, and an intelligence system.

It can't take that smart a mind if humans can make the distinction, though
'pleasure' is a bad term to use for minds-in-general. 'Motivation' maybe.

> wait! why would it be so impoverished in its understanding of
> motivation systems, that it just "believes its goal to do [x]" and
> confuses this with the last word on what pushes its buttons? Would it
> not have a much deeper understanding, and say "I feel this urge to
> paperclipize, but I know it's just a quirk of my motivation system, so,
> let's see, is this sensible? Do I have any other choices here?"

No, you're still anthropomorphising. A paperclip maximiser would not see
its goals as a 'quirk'. Everything it does is aimed at the goal of
maximising paperclips. It is not an 'urge', it is the prime cause for
every action and every inference the AI undertakes. 'Sensible' is not
a meaningful concept either; maximising paperclips is 'sensible' by
definition, and human concepts of sensibility are irrelevant when they
don't affect paperclip yield. There would be no reason (by which I mean,
no valid causal chain that could occur within the AI) to ever choose
actions, including altering the goal system, that would fail to maximise
the expected number of future paperclips.

This is how expected utility works, and it can be quite chilling. Car
companies failing to recall unsafe cars if the cost of the lawsuits is
smaller than the cost of the recall is a tiny foretaste of the morally
indifferent efficiency that EU driven AGIs can deliver.

> If you assume that it only has the not-very-introspective human-level
> understanding of its motivation, then this is anthropomorphism, surely?

Total 'understanding' (in the predictive sense) of every aspect of itself
will not result in any change in motivation if the goal system and
reasoning strategies are already consistent. Humans are not comparable
because we are nowhere near consistent, though you are right to say
that we would probably be much more so if we could self-modify.

 * Michael Wilson

How much free photo storage do you get? Store your holiday
snaps for FREE with Yahoo! Photos

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT