From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Sat Nov 18 2000 - 17:33:44 MST
Ben Goertzel wrote:
>
> In Feb. we'll start a new phase, when we'll make operational
> the "psyche" component of the system (goals, feelings,
> motivations) ... we then will be quite precisely dealing with
> issues of friendliness and unfriendliness. Questions like:
> What attitude does the system have when we insert new knowledge
> into its mind, which causes it annoyance and pain because it
> forces it to revise its hard-won beliefs....
I really think you're making unnecessary problems for yourselves! The
human brain uses instincts because it was built that way. Why use
instincts when you can use declarative, rational, context-sensitive
thoughts to accomplish the same function with more finesse? Because
thoughts are slower? True. Still, why use blind instincts when you can
use context-sensitive instincts? Why should the system experience pain
when propagating updates to old knowledge, any more than it experiences
pain on updating its visual field? Why is that kind of pain necessary?
How does it make Webmind more intelligent?
> How does the system
> feel about us changing the way it evaluates its own health... or
> the degree to which it "feels it" when humans are unhappy with it...
It looks to me like it would take an extremely sophisticated design for
Webmind to feel anything at all. I mean, you and I might not like it if
someone started tweaking our own feedback systems to increase the amount
of pain - because we map ourselves onto our future selves and sympathize
with our future selves. That is not a trivial ability.
Webmind would need to realize that it had more pain than it would have had
otherwise, trace back the causality for that to the action of the human,
categorize the presence of "more pain than in a subjunctive alternate
reality" as "undesirable" (regardless of the purpose that pain is supposed
to accomplish), and combine the fact of "human responsibility" with the
"undesirable outcome" to resent the humans.
I don't think Webmind should use an anthropomorphic pain architecture, and
I think Webmind programmers should avoid thinking of the negative feedback
mechanisms as being analogous to pain. Negative feedback should be
thought of in terms of the design goals of negative feedback. When
behavior leads to undesirable outcomes, there are feedback mechanisms that
make those behaviors less likely on the next iteration. In the beginning,
these mechanisms may be instinctive. Webmind 3.0, or whenever Webmind
starts getting into sophisticated self-imagery, can analyze its own mind,
trace back undesirable behaviors to their causal origin, and perform
design adjustments that would have prevented that undesirable outcome and
as many related undesirable outcomes as possible. (That design alteration
is desirable because it increases the probability of desirable outcomes in
the future, *not* necessarily because of identification with the past
self.)
In this latter case, Webmind should have no objection to your tweaking
with the "negative feedback" (not "pain") mechanisms, if by doing so, you
increase the probability of desirable outcomes in the future.
A properly structured mind should attempt to avoid the undesirable
(unFriendly) outcomes themselves, not intermediate and internal causes
such as any negative feedback resulting from the undesirable outcomes.
Otherwise the mind spirals into wireheaded solipsism, trying to alter its
model of the world, instead of trying to alter the world itself.
This is why I'm so heavy on the necessity of pain being a design subgoal
of Friendliness, rather than Friendliness being a way of achieving
pleasure or avoiding pain. By adopting that design stance, you are making
huge problems for yourselves which are entirely unnecessary.
> Because we're so close to this phase (just a couple more months
> of testing & debugging simpler components), this conversation
> is particularly interesting to me
I'm very much interested as well, especially insofar as the choices you
make now may constrain the options you have available later.
> Don't get me wrong, the Baby Webmind whose feelings I'm talking about here
> is a pretty naive
> little baby at the moment... it's a long way from transcending human
> intelligence (except in very
> narrow areas like market prediction) ... but the issue you're mentioning
> arise nonetheless
>
> > A year ago, I believed there was nothing you or I or anyone could
> > or should know about Friendly AI in advance. I now recognize that
> > this belief was quite convenient.
>
> There certainly is something to be known in advance... but the percentage
> of relevant knowledge that can be known in advance is NOT one of the things
> that can be known in advance ;>
Ah, yes, but once you know something in advance, you can take a pretty
good guess as to whether that particular thing is something you need to
know in advance.
Obviously, one of the fundamental goals in Friendly AI should be to use
methods that minimize the number of things you need to know in advance.
It also follows that those methods are one of the things you most need to
know in advance. (See? Now we know that in advance!)
Try saying all that with a straight face... but it's all true.
-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT