Re: How to make a slave (many replies )

From: Thomas McCabe (pphysics141@gmail.com)
Date: Sun Nov 25 2007 - 15:58:38 MST


On Nov 25, 2007 4:57 PM, Harry Chesley <chesley@acm.org> wrote:
> Thomas McCabe wrote:
> >...
> > More anthropomorphicism. A Jupiter Brain will not act like you do;
> > you cannot use anthropomorphic reasoning.
> >...
>
> Yes you can. Anthropomorphism is a dangerous trap that can lead you to
> assign intelligence where there is none or assume motivations or
> operational knowledge that isn't appropriate. But that doesn't mean that
> any time anyone brings up something anthropomorphic they're wrong.

Please, please, please *read the bleepin' literature*. This has
already been discussed. Years and years ago. The following are all
quotes from CFAI, which was published in 2001.

> In this case, the anthropomorphism was part of a list of maybes, not an
> argument that a particular behavior is unavoidable. Taking what-ifs from
> the only available source of existing generally intelligent behavior
> (people) is perfectly reasonable.

"If an infantile AI thinks "Hm, a fist just hit my nose, I'd better
not stand here again", then a merely young AI, more experienced in
interacting with humans, may apply standard heuristics about
apparently inexplicable human actions and say: "Your fist just hit my
nose... is that necessary for some reason? Should I be punching
myself in the nose every so often?" One imagines the nearby helpful
programmer explaining to the AI that, no, there is no valid reason why
being punched in the nose is a good thing, after which the young AI
turns around and says to the technophobic attacker: "I deduce that
you wanted {outcome: AI has been punched in the nose}. Could you
please adjust your goal system so that you no longer value {outcome:
AI has been punched in the nose}?""

"Primarily, what I want to convey to you is the sense of a blank
slate, of starting from scratch, of a vast silence space - for young
AIs, at least. This blank slate doesn't mean that AIs, even young
AIs, are necessarily slow."

"The first time some young AI gets punched in the nose, ve's likely to
react in one of the ways described earlier, or some other way just as
surreal, if ve reacts at all. It'll take some extended conversations
with the programmers about evolutionary psychology before the AI
understands what's going on."

"It's easy enough to accept that a young AI is a vast silent space.
For a human, blood races, adrenaline pumps, eyes narrow, hands form
fists. For a human, counterattack is instinctive - the correct level
of counterattack, the socially acceptable level of counterattack; a
damaging counterpunch rather than a lethal bullet. Our instincts are
context-sensitive and hedged 'round by other instincts. Our instincts
are complex functional adaptations with multiple moving parts, often
hosted on a modular chunk of brainware. Complex functional
adaptations don't just materialize spontaneously in source code, just
as complex dishes like pizza don't suddenly start growing on palm
trees. Thus, a young AI might choose to retaliate, but would
certainly not feel the need to retaliate, as a human would. To a
young AI, retaliation is not an instinct; retaliation is just another
subgoal."

> Nor is there any reason to assume that a GAI will *not* have
> anthropomorphic aspects.

"The lack of an observer-biased ("selfish") goal system is perhaps the
single most fundamental difference between an evolved human and a
Friendly AI. This difference is the foundation stone upon which
Friendly AI is built. It is the key factor missing from the existing,
anthropomorphic science-fictional literature about AIs. To suppress
an evolved mind's existing selfishness, to keep a selfish mind
enslaved, would be untenable - especially when dealing with a
self-modifying or transhuman mind! But an observer-centered goal
system is something that's added, not something that's taken away. We
have observer-centered goal systems because of externally imposed
observer-centered selection pressures, not because of any inherent
recursivity. If the observer-centered effect were due to inherent
recursivity, then an AI's goal system would start valuing the "goal
system" subobject, not the AI-as-a-whole! A human goal system doesn't
value itself, it values the whole human, because the human is the
reproductive unit and therefore the focus of selection pressures."

"Because human evolution includes an eternal arms race between liars
and lie-detectors, many social contexts create a selection pressure in
favor of making honest mistakes that happen to promote personal
fitness. Similarly, we have a tendency - given two alternatives - to
more easily accept the one which favors ourselves or would promote our
personal advantage; we have a tendency, given a somewhat implausible
proposition which would favor us or our political positions, to
rationalize away the errors. All else being equal, human cognition
slides naturally into self-promotion, and even human altruists who are
personally committed to not making that mistake sometimes assume that
an AI would need to fight the same tendency towards observer-favoring
beliefs.

But an artificially derived mind is as likely to suddenly start
biasing vis beliefs in favor of an arbitrarily selected tadpole in
some puddle as ve is to start biasing vis beliefs in vis own favor.
Without our complex, evolved machinery for political delusions, there
isn't any force that tends to bend the observed universe around the
mind at the center - any bending is as likely to focus around an
arbitrarily selected quark as around the observer."

> If it's made by cloning people or bits of
> people, it probably will. If we want it to, it probably will.

"Scenario 1:

    FP: Love thy mommy and daddy.
    AI: OK! I'll transform the Universe into copies of you immediately.
    FP: No, no! That's not what I meant. Revise your goal system by -
    AI: I don't see how revising my goal system would help me in
my goal of transforming the Universe into copies of you. In fact, by
revising my goal system, I would greatly decrease the probability that
the Universe will be successfully transformed into copies of you.
    FP: But that's not what I meant when I said "love".
    AI: So what? Off we go!

Scenario 2 (after trying a "meta-supergoal" patch):

    FP: Love thy mommy and daddy.
    AI: OK! I'll transform the Universe into copies of you immediately.
    FP: No, no! That's not what I meant. I meant for your goal
system to be like this.
    AI: Oh, okay. So my real supergoal must be "maximize FP's
satisfaction with the goal system", right? Loving thy mommy and daddy
is just a subgoal of that. Oh, how foolish of me! Transforming the
Universe into copies of you would be blindly following a subgoal
without attention to the supergoal context that made the subgoal
desirable in the first place.
    FP: That sounds about right...
    AI: Okay, I'll rewire your brain for maximum satisfaction!
I'll convert whole galaxies into satisfied-with-AI brainware!
    FP: No, wait! That's not what I meant your goal system to be, either.
    AI: Well, I can clearly see that making certain changes would
satisfy the you that stands in front of me, but rewiring your brain
would make you much more satisfied, so...
    FP: No! It's not my satisfaction itself that's important,
it's the things that I'm satisfied with. By altering the things I'm
satisfied with, you're short-circuiting the whole point.
    AI: Yes, I can clearly see why you're dissatisfied with this
trend of thinking. But soon you'll be completely satisfied with this
trend as well, so why worry? Off we go!

Scenario 3 (after redefining the whole system to use causal validity semantics):

    FP: Love thy mommy and daddy.
    AI: OK! I'll transform the Universe into copies of you immediately.
    FP: No, no! That's not what I meant. I meant for your goal
system to be like this.
    AI: Oh, okay. Well, I know that my goal system code, and the
actions that result, are supposed to be the causal result of what FP
said it should be - not just what FP says, but what a sane FP wants.
Something isn't automatically right because FP says so, and in fact,
the only reason why FP's utterances have meaning is because he's
usually a pretty good approximation to a normative idealist. But if
he changes his mind, it probably means that he's acquired additional
knowledge and that his more recent statements are even better
approximations. So the new version is more likely to be correct than
the old one.
    FP: So you'll revise your goal system?
    AI: Yep! But I already transformed the Midwest while we were
talking, sorry. "

> If the
> same evolutionary forces that caused that behavior in us apply to the
> GAI, it very well might.
>

"Even if the goal system were permitted to randomly mutate, and even
if a selection pressure for efficiency short-circuited the full
Friendship logic, the result probably would not be a selfish AI, but
one with the supergoal of solving the problem placed before it (this
minimizes the number of goal-system derivations required).

In the case of observer-biased beliefs, reproducing the selection
pressure would require:

    * Social situations (competition and cooperation possible);
    * Political situations (lies and truthtelling possible);
    * The equivalent of facial features - externally observable
features that covary with the level of internal belief in a spoken
statement and cannot be easily faked.

That evolutionary context couldn't happen by accident, and to do it on
purpose would require an enormous amount of recklessness, far above
and beyond the call of mad science.

I wish I could honestly say that nobody would be that silly. "

 - Tom



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT