Re: Fragile Feelings of an AI WAS: Gender Neutral Pronouns

From: Eliezer S. Yudkowsky (
Date: Mon Apr 02 2001 - 14:52:31 MDT

Durant Schoon wrote:
> (I stopped reading FAI when I got to the references to Greg Egan's Diaspora.
> Why? I don't want to have the book spoiled for me (yes, I'm that silly).
> I've finished "The Origins of Virutue" Sunday (BIG thumbs up!), and
> started Permutation City (also BIG thumbs up!). So if my understanding of
> FAI is shallow, you'll know why).


I would never, ever include a spoiler without great big flaming warning
signs three pages in advance. My friends can attest to my insane
fanaticism in this area. The items included from Diaspora are a bit
future-shocky, but they should be just as future-shocky in FAI as in the
book, and there's plenty of future shock left over; I guarantee zero
impact on the plot.

> So let me ask, will the nascent FAI have a "sense of self", which can have
> a positive or a negative emotional attachment? When the AI is young and still
> working on self optimization, it is conceivable to me that one parameter to
> tweak is "propogation of self worth". When ver human teacher/programmer/role
> models tells ver that something ve did was wrong, that lesson should be
> incorporated and used to affect future behavior. Is there a middle step,
> though? Is there an internal sense of failure, of having done something
> wrong before the attempt to correct the problem? Humans have this. Do AI's
> need it?

There's an already-written section of Friendly AI that explains all of
this, with diagrams... that hasn't been uploaded yet. For now... um, I'm
sorry to say this, but the question is so orthogonal to the proposed
architecture that I'm not even sure where to start. The AI treats
programmer statements as sensory information about supergoal content. If
the AI takes an action and the action fails to achieve its purpose, the AI
is less likely to try it again, but that's because the hypothesis that
"Action X will lead to Parent Goal Y" has been disconfirmed by the new
data (i.e, backpropagation of negative reinforcement and positive
reinforcement can be shown to arise automatically from the Bayesian
Probability Theorem plus the goal system architecture - this is where the
diagrams come in).

It's possible to derive quantities like "self-confidence" (the degree to
which an AI thinks that vis own beliefs have implications about reality),
"self-worth" (the AI's estimate of vis own value to the present or future
achievement of supergoals), and so on, but these quantities wouldn't play
the same role as they do in humans - or a role anywhere near as important,
given the lack of hardware social connotations.

What I expect to be the most important quantities for a Friendly AI are
things like "unity of will" (the degree to which the need to use the
programmers as auxiliary brains outweighs any expected real goal
divergence), "trust" (to what degree a given programmer affirmation is
expected to correspond to reality), "a priori trust" (the Bayesian priors
for how much the programmers can be trusted, independent of any
programmer-affirmed content), and so on.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT