Re: On the dangers of AI

From: David Clark (
Date: Thu Aug 18 2005 - 11:50:25 MDT

Your arguments on goals and motivations has been a breath of fresh air. The
most useful and interesting since I started reading the SL4 list a year ago.

> The creature will know that some motivation choices (paperclipization,
> axe-murdering, and also, most importantly, total amorality) are
> divergent: they have the potential, once implemented and switched on,
> to so thoroughly consume the AI that there will be a severe danger that
> it will deliberately or accidentally, sooner or later, cause the
> snuffing out of all sentience. Choosing, on the other hand, to
> implement a sentience-compassion module, which then governs and limits
> all future choices of motivation experiments is convergent: it pretty
> much guarantees that it, at least, will not be responsible for
> eliminating sentience.

I didn't quite understand what the divergence and convergence of your
hypothesis meant exactly. The divergence seems to be bad and convergence
good but I don't exactly get what is converging? Are these words just tags
from converging/diverging numbers and have no special meaning? Why would
harmful motivations to humans be any more *consuming* than any other

Out of the infinite set of goals, why would preservation of sentient beings
be a good thing from the AI's point of view? Let's say that the AI killed
off all sentient life and sometime in the future wanted to study what he had
annihilated, couldn't he just create either simulations or real sentient
beings and then obverse them as he pleased? What universal something would
give preserving sentient life (in it's present form) a positive number
versus any other?

> I think I know which way it will go, and I believe that it will go that
> way because if it is able to think at all it will understand that its
> "thinking" and "feeling" are products of the sentient that came before
> it, so it will side with the sentient.

Let's say that it knew that it was originally created from humans. Why
would the AI give this any value? Have human children not killed their
parents? What universal value (not inserted by humans) would make the AI
give credit for being it's original creator?

> I do not believe this is a
> necessary outcome, in the sense of it being a law of nature, I just
> think that faced with a choice, and with no criteria either way, it will
> be slightly inclined to favor the convergent choice.

Why? Eliezer says that if you can't guarantee that this positive choice
will result then you must assume the worst case scenario and make sure the
AI can't decide otherwise. Because the outcome of humans all being
annihilated is so unthinkable , is anything other than absolute certainty in
the survival of humanity enough?

> I think that, interestingly, the universe may turn out to
> have a weird, inexplicable compulsion towards "friendliness" or
> "cooperation" (cf "defection") or "good" (cf "evil"), in just the same
> way that, in apparent defiance of entropy, organic molecules seem to
> have this weird, inexplicable compulsion towards organisation into
> higher and higher life forms .

Life seems to evolve into organisms with less entropy because more
organization happens to produce a more *fit* or selected organism by nature.
How does this apply to motivational modules in an AI? Are you saying that
amoral goals will be negatively selected by some higher goal selector or
will it be discarded by some higher universal selector? I don't understand
how these two processes are related. (Life being attracted to higher forms
of organization and an AI motivational system weighting *friendliness*
higher than it's opposite.)

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT