RE: guaranteeing friendliness

From: Herb Martin (HerbM@LearnQuick.Com)
Date: Sat Dec 03 2005 - 23:06:56 MST

> There is a fallacy oft-committed in discussion of Artificial
> Intelligence, especially AI of superhuman capability. Someone says:
> "When technology advances far enough we'll be able to build minds far
From: Eliezer S. Yudkowsky
> Herb,

This reply thoroughly snips and chops up
the (Eliezer's) original -- I am assuming
that anyone interested read that full and
incitely post:

> surpassing human intelligence. A superintelligence could
> build enormous
> cheesecakes - cheesecakes the size of cities - by golly, the
> future will
> be full of giant cheesecakes!" The question is whether the
> superintelligence wants to build giant cheesecakes. The vision leaps
> directly from capability to actuality, without considering
> the necessary
> intermediate of motive.
> People often immediately declare that Friendly AI is an
> impossibility,
> because any sufficiently powerful AI will be able to modify its own
> source code to break any constraints placed upon it.

Note please: I never claimed this. I claimed that
guaranteeing a Friendly AI is not provably possible
and insisted that we should in fact try in any case
to do so.

The problem is not with cheesecakes but with the real
metagoals we give such an AI (and possibly with any
evolution, a word which gives the right flavor better
than development) of those goals.

We certainly would not want to be the person who claimed
that "man would never fly" but we can agree with the
thousands of engineers who accept that it is a practical
impossibility to guarantee the safety of any particular
aircraft (so far.)

With AI, we may not get the multitude of second chances
to learn from our mistakes -- it is highly improbable
that our first successes will be guaranteed to be friendly.

This doesn't mean we won't succeed in FAI, but rather that
it is highly likely to take multiple tries, and we may not
get those additional attempts.

> The first flaw you should notice is a Giant Cheesecake
> Fallacy. Any AI
> with free access to its own source would, in principle, possess the
> ability to modify its own source code in a way that changed the AI's
> optimization target (the region into which the AI tries to steer
> possible futures). This does not imply the AI has the motive
> to change
> its own motives. I would not knowingly swallow a pill that made me
> enjoy killing babies, because currently I prefer that babies not die.

And if the AI is told "don't kill any babies" or "do not allow
babies to die" what possible actions might it take to do that?

(And ignore movie-stupid misinterpretations where it just
destroys everyone so there are never any babies which MIGHT
die in the future.)

Even the range of actions from such metagoals are very difficult
to predict.

> pretty nice and wish they were nicer. At this point there are any
> number of vaguely plausible reasons why Friendly AI might be humanly
> impossible, and it is still more likely that the problem is
> solvable but
> no one will get around to solving it in time. But one should not so
> quickly write off the challenge, especially considering the stakes.

I do not believe that it is humanly impossible but rather
that it is impossible to PROVE (or guarantee) it, a priori.

And yes, we should NOT write off attempts to do it correctly,
especially considering the stakes.

Herb Martin

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:54 MDT