From: Richard Loosemore (rpwl@lightlink.com)
Date: Tue Aug 16 2005 - 20:54:43 MDT
Peter,
This is very tricky territory, I believe, so I am going to try to go 
through what you say very carefully....
Peter de Blanc wrote:
> On Tue, 2005-08-16 at 16:57 -0400, Richard Loosemore wrote:
> 
>>Here is the strange thing:  I would suggest that in every case we know 
>>of, where a human being is the victim of a brain disorder that makes the 
>>person undergo spasms of violence or aggression, but with peaceful 
>>episodes in between, and where that human being is smart enough to 
>>understand its own mind to a modest degree, they wish for a chance to 
>>switch off the violence and become peaceful all the time.  Given the 
>>choice, a violent creature that had enough episodes of passivity to be 
>>able to understand its own mind structure would simply choose to turn 
>>off the violence
> 
> 
> There's an important distinction which you're missing, between a mind's
> behaviors and (its beliefs about) its goal content. As human beings, we
> have evolved to believe that we are altruists, and when our evolved
> instincts and behaviors contradict this, we can sometimes alter these
> behaviors.
> 
> In other words, it is a reproductive advantage to have selfish
> behaviors, so you have them, but it is also a reproductive advantage to
> think of yourself as an altruist, so you do. Fortunately, your
> generally-intelligent mind is more powerful than these dumb instincts,
> so you have the ability to overcome them, and become a genuinely good
> person. But you can only do this because you _started out_ wanting to be
> a good person!
I want to argue first that I am not missing the distinction between a 
mind's behaviors and (its beliefs about) its goal content.
First, note that human beings are pretty dodgy when it comes to their 
*beliefs* about their own motivations:  self-knowledge of motivations in 
individual humans ranges from close to zero (my 97 year old grandmother 
with dementia) through adeptly contortionist (my delightful but 
sometimes exasperating 6-year old son) to grossly distorted (Hitler, who 
probably thought of himself as doing wonderful things for the world) and 
   on through sublimely subtle (T.E. Lawrence?  Bertrand Russell?). 
Truth is, we have evolved to play all kinds of tricks on ourselves, and 
to have many levels of depth of understanding, depending on who we are 
and how hard we try.
Most of us are very imperfect at it, but (and this is an important 
point) the more we try to study motivation objectively, and internally 
observe what happens inside ourselves, the better, I claim, we become.
So, yes, part of the story is that we have evolved to think of ourselves 
as altruists - or rather, as altruists with respect to our kinsfolk and 
relatives, but often not global altruists.  And when our instincts 
contradict our perceptions of who we think we *should* be, we can 
sometimes modify the instincts.  The full picture involves quite a 
tangled web of interacting forces, but yes, this central conflict is 
part of the story, as you point out.
So far, so good.  To be old-fashioned about it, Superego clobbers Id 
when it gets out of control, and we end up becoming "a genuinely good 
person."
But now, if I read you aright, what you are saying is that the reason 
Superego gets the upper hand in the end is that the system was designed 
with fundamental altruism as goal number one ("you can only do this 
because you _started out_ wanting to be a good person!") and because 
this goal was designed in from the beginning, this is the reason why it 
eventually (at least in the case of nice people like you and I) 
triumphed over the baser instincts.
Hence, it depends on what was goal number one in the initial design of 
the system (altruism, rather than ruthless dominance or paperclipization 
of the known universe).  Whatever was in there first, wins?
I have two serious disputes with this.
1) Are you sure?  I mean are you sure that the reason why this 
complicated (in fact probably Complex) motivation system, which is more 
than just an opponent-process module involving Supergo and Id, but is, 
as I argued above, a tangled mess of forces, is ending up the way it 
does by the time it matures, *only* because the altruism was goal number 
one in the initial design?  I am really not so sure, myself, and either 
way, this is something that we should be answering empirically - I don't 
think you and I could decide the reasons for its eventually settling on 
good behavior without some serious psychological studies and 
(preferably) some simulations of different kinds of motivation systems.
2) Quite apart from that last question, though, I believe that you have 
introduced something of a red herring, because all of the above 
discussion is about ordinary people and their motivational systems, and 
about their introspective awareness of those systems, and the 
interaction betwixt motivation and introspection.
In my original essay, though, I was talking not about ordinary humans, 
but about creatures who, ex hypothesi, have quite a deep understanding 
of motivation systems in minds ... and, on top of that understanding 
they have the ability to flip switches that can turn parts of their own 
motivation systems on or off.  My point is that we rarely talk about the 
the kind of human that has a profoundly deep and subtle understanding of 
how their own motivation systems is structured (there just aren't that 
many of them), but this is the population of most interest in the essay. 
  So when you correctly point out that all sorts of strange forces come 
together to determine the overall niceness of a typical human, you are 
tempting me off topic!
Having said all this, I can now meet your last point:
> You are anthropomorphizing by assuming that these beliefs about goal
> content are held by minds-in-general, and the only variation is in the
> instinctual behaviors built in to different minds. A Seed AI which
> believes its goal to be paper clip maximization will not find
> Friendliness seductive! It will think about Friendliness and say "Uh oh!
> Being Friendly would prevent me from turning the universe into paper
> clips! I'd better not be Friendly."
Wait!  Anthropomorphizing is when we incorrectly assume that a thing is 
like a human being.
What you are saying in this paragraph is that (1) my original argument 
was that niceness tends to triumph in humans, (2) I misunderstood the 
fact that this actually occurs because of our particular beliefs about 
our goal content (the altruism stuff, above), and (3) continuing this 
misunderstanding, I falsely generalized and assumed that all minds would 
have the same beliefs about their goal content (?... I am a little 
unclear about your argument here...).
No, not at all!  I am saying that a sufficiently smart mind would 
transcend the mere beliefs-about-goals stuff and realise that it is a 
system comprising two things:  a motivational system whose structure 
determines what gives it pleasure, and an intelligence system.
So I think that what you yourself have done is to hit up against the 
anthropomorphization problem, thus:
 > A Seed AI which
 > believes its goal to be paper clip maximization will
wait!  why would it be so impoverished in its understanding of 
motivation systems, that it just "believes its goal to do [x]" and 
confuses this with the last word on what pushes its buttons?  Would it 
not have a much deeper understanding, and say "I feel this urge to 
paperclipize, but I know it's just a quirk of my motivation system, so, 
let's see, is this sensible?  Do I have any other choices here?"
If you assume that it only has the not-very-introspective human-level 
understanding of its motivation, then this is anthropomorphism, surely? 
  (It's a bit of a turnabout, for sure, since anthropomorphism usually 
means accidentally assuming too much intelligence in an inanimate 
object, whereas here we got caught assuming too little in a 
superintelligence!)
To illustrate:  I don't "believe my goal is to have wild sex."  I just 
jolly well *like* doing it!  Moreover, I'm sophisticated enough to know 
that I have a quirky little motivation system down there in my brain, 
and it is modifiable (though not by me, not yet).
Bottom Line:
It is all about there being a threshold level of understanding of 
motivation systems, coupled with the ability to flip switches in ones 
own system, above which the mind will behave very, very differently than 
your standard model human.
Hope I didn't beat you about the head too much with this reply!  These 
arguments are damn difficult to squeeze into email-sized chunks.  Entire 
chapters, or entire books, would be better.
Richard Loosemore.
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT