Re: Deliver Us from Evil...?

From: Eliezer S. Yudkowsky (
Date: Sat Mar 24 2001 - 10:06:52 MST

James Higgins wrote:
> At 03:42 PM 3/24/2001 +0000, Christian L. wrote:
> >Yes, the problems remain. While "death" can be clearly defined, "pain" and
> >"coercion" cannot. Have you got separate definitions for these too?
> Actually, my question lies in the involuntary stupidity. How do you define
> that?

First, I answer in my capacity as a wetware sim of a Friendly AI:

"The key word is INVOLUNTARY. This very easily lets me solve the problem
by punting it back to individual sentients. The individual defines
"pain". The individual defines "stupidity". In general, if there's not
an objective or very strongly convergent answer, I just punt the question
back to the citizen."

Next, I answer in my capacity as a human altruist:

"I get to define evil however I like. In fact, from my perspective, the
evil of stupidity has nothing to do with whether it's voluntary or
involuntary. For me, the attraction of the Sysop Scenario is my belief
that the vast majority of people will choose not to be stupid. Thus, by
constructing a Friendly AI that fulfills people's volitions, I will have
initiated a causal chain that ends in the elimination of what I define as
tremendous evil, and in the creation of what I define as tremendous good.
That's what makes the Singularity a good thing. It doesn't mean that my
definitions are contaminating supergoal content."

Next, I answer in my capacity as a Friendship programmer:

"Individual volition is only one view, philosophically, of the
be-all-and-end-all of morality. The general principle that Friendly AIs
should be able to handle anything a human altruist can handle means that a
Friendly AI may need to value life, truth, joy, freedom, et cetera, in
addition to valuing volition. In the event that volition does come out on
top, which is what I expect, then the valuation of such subsidiary goals
would be expressed in interactions with any given individual to the extent
that that individual wanted the Sysop to express them. In other words,
each citizen would get to decide how much moral personality their facet of
the Sysop would appear to have."

> Exactly. The only thing that worries me about the future, is this point
> right here. That human individuals are going to try and influence rules
> post-SI so that it becomes their personal concept of utopia. I would hope
> that Powers are inherently friendly in virtually all cases by nature.

Yes, that was my working model from 1996 to 2000. During 2000 I started
switching from "objective morality" to "Friendly AI". Note that both
models are very strongly programmer-independent.

Compare (objective morality)
vs. (Friendliness)

> In
> any case I thought that the idea of us providing rules for the other side
> had been totally, and completely proven a terrible idea. For example,
> forcing Asimov Laws was said to be a horrible idea because it is impossible
> to determine exactly what they would become after thousands, or even
> millions of upgrades. Does that not equally apply to "friendliness"?

This, again, was exactly what I would have said back when I conceived of
Asimov Laws as being intrinsically opposed to objective morality - a set
of artificial rules that prevents the system from following vis own
course. What pried me loose of the no-Friendliness-needed model in the
first place was the possibility that the objective morality would be
built, rather than discovered - in which case, the AI would need to know
what to build.

Asimov Laws are inherently ugly and coercive. I've always believed that
and I probably always will. Coming up with a replacement theory for
objective morality that was *equally* *beautiful* was the challenge. I do
believe that, in _Friendly AI_, I have succeeded; there is still no
coercion involved. It's just that the ideal of "no sensitivity to initial
conditions" has been replaced by the still broader ideal that a Friendly
AI should be able to handle at least all moral and philosophical problems
that can be handled by humans, including either the presence or absence of
objective morality. Since humans themselves innately dislike sensitivity
to initial conditions in moral philosophies, the new (nonugly) theory is
actually a more powerful generalization of the previous (nonugly) theory.
Friendliness creates the path of an AI; it doesn't oppose an existing
path. If you are ever, at any point, *afraid* that the AI will see some
truth or gain some aspect of intelligence, then you are working against
the AI, rather than with ver, and you have departed the path of

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT