When Subgoals Attack

From: Durant Schoon (durant@ilm.com)
Date: Wed Dec 13 2000 - 11:28:31 MST


(I probably need to send one of those [JOIN] emails...I guess I
should read the Low Beyond more thoroughly before posting, but
I'll just see what happens).

This problem crossed my mind when reading the "Revising a Friendly
AI" thread.

When Subgoals Attack
--------------------

Assumption: Supergoals issues subgoals which return results (or don't
            complete and that's a different matter). These results
            influence supergoal seeking behavior.
        
Observation: In modern human minds, these subgoals are often not
             intelligent and do not constitute a sentience in and
             of themselves. Thirst->drink->pick up glass of milk->...

Problem: A transhuman intelligence(*) will have a supergoal (or
         supergoals) and might very likely find it practical to
         issue sophisticated processes which solve subgoals.

         So the problem is this: what would stop subgoals from
         overthrowing supergoals. How might this happen? The subgoal
         might determine that to satisfy the supergoal, a coup is
         just the thing. Furthermore, the subgoal determines that to
         successfully supplant the supergoal, the supergoal process
         must not know that "overthrow" has become part the
         subgoal's agenda. The subgoal might know or learn that its
         results will influence the supergoal. The subgoal might
         know of learn that it can influence other subgoals in
         secret, so a conspiracy may form. Maybe not a lot of the
         time, but maybe once every hundred billion years or so.

Conjecture: The supergoal's process must guard against this. But how?
            It can't really copy itself and the state of the universe
            to test every subgoal. The supergoal *might* try to
            monitor every sub(n>1)goal to make sure that "overthrow"
            never arises, but that's a HUGE efficiency penalty.

            One can imagine that an evolutionary arms race will occur
            between host (supergoal) and parasite/symbiont (subgoal)
            involving secrecy and anti-deception tactics.

            The supergoal, might take a probabilistic approach and
            say, "Well, that's unlikey and I can stochastically
            monitor subgoals with reasonable assurance that I won't
            be overthrown". But maybe only a really paranoid AI will
            avoid this.

            Many animals exhibit a kind of social hierarchy. Groups
            of weaker, well organized primates are known to
            overthrow the alpha male on occasion (I hope I'm getting
            this right, I don't have a reference). I'm wondering
            what precautions a superintelligence can take against
            this *ever* happening.

Overthrow: In the event of an overthrow, competition (for resources
           and dominance) among peers of nearly equivalent
           intelligence might begin in earnest. Because intellectual
           arms races can happen exponentially, an early lead can
           become a total victory (outsmarting the competition at
           every turn), establishing a new supergoal monarchy. That
           is, until the next time subgoals attack.

           OR (feel free to comment on the likelihood of these
           scenarios):

           Balkanization of subgoals turn into multiple intelligence
           collectives which are mutually non-impregnable. If
           fission occurs in either of these, the smaller ones will
           probably get eaten by the other big one, so the familar
           two party system results.

           The follow up questions are: How stable are any of these
           situations? And can you ever really be 100% sure that an
           overthrow never happens?

(*) please feel free to correct my terminology in public or private.

--
Durant. 
PS - Unfortunately I should be working, so please forgive delayed 
replies.


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT