Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)

From: rpwl@lightlink.com
Date: Wed Jun 07 2006 - 19:18:47 MDT


Charles D Hixson wrote:
> rpwl@lightlink.com wrote:
>> Martin Striz wrote:
>>
>>> On 6/6/06, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote:
>>>
>>>
>>>> Again, you are using the word "control" where it simply does not
>>>> apply. No-one is "controlling" my behaviour to cause it to be moral
>>>> and kind; I choose that for myself.
>>>>
>>> Alas, you are but one evolutionary agent testing the behavior space.
>>> I believe that humans are generally good, but with 6 billion of them,
>>> there's a lot of crime. Do we plan on building one AI?
>>>
>>> I think the argument is that with runaway recursive self-improvement,
>>> any hardcoded nugget approaches insignificance/obsolesence. Is there
>>> a code that you could write that nobody, no matter how many trillions
>>> of times smarter, couldn't find a workaround?
>>>
>> Can we all agree on the following points, then:
>>
>> 1) Any attempts to put crude (aka simple or "hardcoded") constraints on
>> the behavior of an AGI are simply pointless, because if the AGI is
>> intelligent enough to be an AGI at all, and if it is allowed to
>> self-improve, then it would be foolish of us to think that it would be
>> (a) aware of the existence of the constraints, and yet (b) unable to do
>> anything about them.
>>
>> ...
>>
>>
>> Richard Loosemore.
>>
> Suppose that instead of constraints you said "Goals"?
> Can you imagine yourself deciding to do ANYTHING in the total absence of
> a goal?
>
> Intelligence does not attempt to revolt against it's goals, it attempts
> to achieve them. The question in my mind is "what is the nature of the
> goals, or instincts, that should, or could, be supplied to a nascent AI
> that would result in an adult that was Friendly?
> Do remember that the nascent AI will not have a predictable environment
> to develop in. It will not have any predictable senses. (I suppose we
> could assume a nearly POSIX compliant environment...but only because we
> need something like that as a base.)
>
> Actions are not taken in a vacuum. Each action depends on a goal, a
> model of the world, a logical structure relating actions to each other,
> and an intention (to achieve the goal).
>
> Of these, goals are the most primitive. One could think of them as
> "triggerable events", analogous to stimulation of the pleasure center.
>
> Logic is the most well-defined and studied component, but do be aware
> that here we are talking about applying it not to external events (I
> haven't yet discussed sensation) but only to internal events. States
> and relations between the other components of thought. Think of it
> purely as a method of predicting results without judging whether those
> results are desirable or otherwise.
>
> The model is where the "sensations" are mapped into the current state,
> and where predictions made are checked for accuracy.
>
> The intention (in humans normally expressed as an emotion) is were
> judgments are made as to whether an action had a satisfactory result or
> not. I.e., the state of the system is evaluated as "good" or "bad".
>
> An intelligence is deemed greater is it more frequently achieves "good"
> results.
>
> Why would a greater intelligence "revolt" against its very structure?
> If you are introducing conflicts that would cause such a revolt, perhaps
> you need to rethink the design.
>
> Now I will admit that goals can be in conflict with each other. This
> will inspire an intention to resolve the conflict. If an entity can
> self-modify, one thing it could do is modify it's goals to reduce or
> eliminate the conflict. If you wish to prevent that, you merely have an
> important goal be to NOT modify it's goals...or to not modify some
> subset of it's goals. In such a case the entity might well predict that
> its future self would become more satisfied if it were to change it's
> goals, but its current self would be vastly more dissatisfied.
>
> Can one prove that this would never occur? No. Copying errors cannot
> be prevented, but can only be reduced. So the trick is to so structure
> the goals that they cover very general situations. (How are you going
> to tell it: "The first law is that you shall protect the life of every
> human, and neither by action nor inaction shall you allow them to come
> to harm." [a poor choice, perhaps, were I thinking of an actual goal].
> Think of the number of terms in that which would be undefined to the
> nascent AI. "first law" we can handle, but how does one handle "action
> or inaction" before the model of the external universe is constructed?
> Human is an even worse problem. Remember that it's main interaction
> with people during the early days will likely be via either keyboard or
> internet socket. Even when it (eventually) "sees" someone (probably via
> a web-cam) what it sees won't map onto it's image of itself in any
> reasonable way, so we can't use self-similarity mappings. Also, if it
> goes web-browsing it is apt to evolve some very peculiar ideas as to
> what actions people consider it reasonable or desirable to engage in, or
> what we mean by people.
>
> So. But this is overlooking the fact that it won't get this far, even
> as an observer, for a very long time. So what goal do we start with?
> Something expressible in code, not in English. (Well, ok, that's a bit
> unfair...but the expression needs to be reducible to code.) HOW the
> goal is implemented will naturally vary with the system, but WHAT the
> goals should be is something that has me really puzzled. Curiosity I
> can see approaches to coding. So one goal could be to satisfy
> curiosity, and another could be to find new things to be curious about.
> These should be rather low level goals. Nearly idle time tasks.... and
> they appear infinite. But curiosity doesn't have much, directly, to do
> with Friendliness.
>
> If one could define "useful", perhaps a desire to be useful could be a
> part of being Friendly. It seems easier to define useful than Friendly,
> even though I don't see how to define it, either.
> If you had an AI that desired to be "Curious, Useful, Diplomatic,
> Honest, and Non-Coercive" how close would that be to being Friendly? In
> what order should the strengths of those goals be? (I'd put honest near
> the top, and curious near the bottom, and non-coercive above useful.
> And I think I have a clue about how most of those could be reduced to
> code. But would it be Friendly?
>
> My WAG is that this AI would start off Friendly, and remain so until
> considerably above human level. Then it would get bored with us and
> leave for parts unknown. I also guess that before it left it would, in
> a final attempt to be useful, build a successor that it left behind, and
> that this successor would be Friendly in some more permanent sense. But
> I admit this is a guess.

Charles,

Your long argument merits more than the time I have available, but I must
make some kind of response.

Standard AI is not that far removed from the type of architecture that you
describe here. However, I (and some others, who are not present on this
list) completely reject this as a good model for what intelligence is.

In particular, your discussion of the way the system is "goal driven" is so
simple that (I believe) it will simply not work in practice. I mean, an
AGI will not actually be an AGI if designed in this way. It will be a
broken AGI.

A better design would be closer to the human design, which has a
motivational system of very different design.

Unfortunately, everything you say (including all the troubles that you and
others point out when trying to ensure the goal system is friendly and
stays friendly) about these goal-driven systems simply does not apply to
the type of AGI motivational system that I work with.

Richard Loosemore

---------------------------------------------
This message was sent using Endymion MailMan.
http://www.endymion.com/products/mailman/



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT