Re: Fragile Feelings of an AI WAS: Gender Neutral Pronouns

From: Durant Schoon (
Date: Mon Apr 02 2001 - 17:09:57 MDT

> From: "Eliezer S. Yudkowsky" <>
> I would never, ever include a spoiler without great big flaming warning
> signs three pages in advance.

Hmm, I just skimmed something about Carter-Zimmerman [a polis] and some
characters named Paolo and Elena...If the book is good, I'd rather avoid
everything, even character relationships. (there's also a whole thread in
the sl4 archive waiting for me when I finish the book).

Ok, I'm not listening any more "La-la-la-la"

> There's an already-written section of Friendly AI that explains all of
> this, with diagrams... that hasn't been uploaded yet. For now... um, I'm
> sorry to say this, but the question is so orthogonal to the proposed
> architecture that I'm not even sure where to start. The AI treats
> programmer statements as sensory information about supergoal content. If
> the AI takes an action and the action fails to achieve its purpose, the AI
> is less likely to try it again, but that's because the hypothesis that
> "Action X will lead to Parent Goal Y" has been disconfirmed by the new
> data (i.e, backpropagation of negative reinforcement and positive
> reinforcement can be shown to arise automatically from the Bayesian
> Probability Theorem plus the goal system architecture - this is where the
> diagrams come in).
> It's possible to derive quantities like "self-confidence" (the degree to
> which an AI thinks that vis own beliefs have implications about reality),
> "self-worth" (the AI's estimate of vis own value to the present or future
> achievement of supergoals), and so on, but these quantities wouldn't play
> the same role as they do in humans - or a role anywhere near as important,
> given the lack of hardware social connotations.
> What I expect to be the most important quantities for a Friendly AI are
> things like "unity of will" (the degree to which the need to use the
> programmers as auxiliary brains outweighs any expected real goal
> divergence), "trust" (to what degree a given programmer affirmation is
> expected to correspond to reality), "a priori trust" (the Bayesian priors
> for how much the programmers can be trusted, independent of any
> programmer-affirmed content), and so on.

Actually this precisely answers my question. There is no need to model
human emotions (or emotional responses: panic, urgency, fondness, self
loathing). Using Bayes theorem for a quick and dirty heuristic solution
gets around the Framing Problem(*) assuming the AI has seen similar
problems before.

(Well, there might be a need to model emotions in order to *understand*
humans better, but there is probably no need to *use* that model for
actual problem solving),

(*) The Framing Problem: Correct me if I get this wrong, I'm not that
much of an AI buff, but I think the problem can be illustrated by an
automaton trying to diffuse a bomb before it goes off. The problem is
there are so many solutions to choose from, how does ve pick the right
one in the alloted amount of time without thinking through and weighing
all the solutions first...or that's a description of the Framing Problem
that I read once.

On Human Emotions:

(Warning: spoliers bout Steven Pinker's "How the Mind Works" and Matt
Ridley's "The Origins of Virtue" follow)

I found that Steven Pinker's "How the Mind Works" did a great job of
explaining human emotions in the context the computation model of mind.
The Origins of Virtue touched upon the emergence of emotions as a way out
of the Kidnapper's Dilemma(**). Pinker goes further and explains why people
genuinely feel generosity, love and compassion (basically, generous
strategies evolve, and then cheaters, then cheater-detecters to "out"
the cheaters, then strong, *genuinely* generous feelings emerge from the
selection pressure to get past the cheater-detectors - or something like
that, it's been a couple years since I've read the book).

Pinker also describes why emotions can drive people to go berzerk and all
sorts of fascinating examples of food taboos. If you thought OoV was too
short, pick up HTMW. Great book, also (though not a replacement). I think
Pinker wrote the book right after spending a year with Cosmides and Tooby.
I've also read "Descarte's Error" by Antonio Damasio. It was good, but not
great like OoV and HTMW, were.

(**) The Kidnapper's Dilamma: You've kidnapped someone, but you've
changed your mind. You can release your captive on the condition that
she won't tell anyone. The dilemma is really your captive's because
there is no way to prove to you that once she's free, she won't tell
the police. One way out is to confess an equal crime so that neither
of you has incentive to defect (ie. snitch). Since it's probably
unlikely that the victim has any such skeleton in her closet (or it's
hard to convince the kidnapper), the kidnapper has no choice but to
kill her. Emotions are a way of persuading the kindapper, when in
fact there is no logical reason to let the victim go (see The Origins
of Virtue for a less-mangled description).


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT