Could dystopian simulated minds get control of a super AGI?

From: Philip Sutton (
Date: Wed Apr 07 2004 - 09:06:17 MDT

Hi Ben,

> there are now specific ideas regarding what kind of architecture is
> likely to make teaching of positive ethics more successful (the
> mind-simulator architecture).

I think the inclusion of mind-simulator architecture in an AGI is a really
valuable addition to the intended Novamente architecture. I think it's
critical if AGIs are to be social in disposition and I have an intuition that
a social disposition is vital if AGIs are to be friendly (to other AGIs,
humans, other life of any sort anywhere).

So how can this necessary mind-simulator architecture be made to
work and in particular be made so that it won't go wrong in serious

In earlier discussions Eliezer challenged people by saying that he was
convinced that if an AGI reached a very high level of mental
competence that it could talk any minders into letting it out of the sand
box where it might have been constrained for reasons of the safety of
the wider community.

I think this problem needs to be tackled also in a slightly different
context. AGIs based on self-modifiable architecture in a networked
world are highly fluid. It is hard to say where the entity begins and ends
precisely - I guess because it doesn't begin and end precisely.

So lets see where the following scenario leads......

A super AGI emerges and it is social and friendly to life (eg. all sorts,
everywhere). Basically it's an all round nice super-sentient.

But for one reason or another the super-AGI starts to think about other
minds that are nasty - either it's driven by curiosity or because the
super-AGI needs to deal with some nasty sentients in real life or
someone asks the super-AGI to simulate some nasty sentients so that
strategy 'games' can be run to test benevolent moves to overcome
malevolent moves that could be taken by other sentients.

So the super-AGI builds one or more malevolent mind simulations. If
the super-AGI is a truly giant mind, then the simulated mind that it
generates could be itself quite a powerhouse. I can see two scenarios
branching off at this point.

One strand (the most probable??) might be that the simulated
malevolent mind becomes a 'tempation meme' within the super-AGI
mind and that it actually sets up dynamics that undermine the good-guy
character of the super AGI. ie. the simulated malevolent mind might
be so good at sophistry that it convinces the overall host mind complex
that it can meet its prime goals in ways that subtly violate friendliness -
allowing a goal-modification drift that finally overturns friendliness of
the host super-AGI in serious ways.

The other slightly bizarre sub-scenario is that the simulated malevolent
mind actually becomes aware that it is contained within the mind-
complex of the super-AGI and that it works out a way to hack its way
out of the host mind and into the Internet where it can take over
computing power that is managed by less powerful intelligences and it
becomes an independent mental entity in it's own right.

I can imagine the boundary between the host mind and the simulated
mind breaking down if the host mind succumbs to the temptation to talk
directly to the simulated mind. (I know a couple of people whose minds
seem to work a bit like this!)

How can AGI architecture and training be developed to prevent such

What prompted me to think of this is that many people have been
thinking that friendly AGIs might be helpful / essential to help protect
other sentients from malevolent sentients (crazed humans or
whatever). But for AGIs to play this role they surely have to be able to
simulate the malevolent sentients they are trying to deal with.

It's a bit like playing Age of Empires - even if you see yourself as on the
side of the angels - the computer is given the job of simulating nasty
hordes that you have to deal with. But what if the nasty hordes can
change the mind of the computer or can escape into the network?

It reminds me of the old (not old for some) superstition that it's
dangeous to think 'bad thoughts' because these thoughts might be set
loose in the world. Hence the prohibition against speaking the name of
evil powers/concepts.

(( As an aside, both the sub-scenarios illustrate what might be the
fastest route to human 'upload'. ie. create a super AGI and then get it
to think about you a lot - thereby creating a simulation of your
behavioural etc. essence. If your physical body dies there's still much
of your essence still extant. But if time is on your side then you can
wait for the science of human/AGI broadband interface to develop and
you then can reintegrate with your simulated self! ))

But putting the aside aside, I think there's a huge issue about how the
'bad thoughts' of AGIs can be contained to remain as mere simulations
for thinking purposes and how they can be stopped from taking on a life
of their own that changes the host AGI for the worse or that escapes to

Cheers, Philip

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:46 MDT