Re: friendly ai

From: Samantha Atkins (samantha@objectent.com)
Date: Mon Jan 29 2001 - 20:36:46 MST


"Eliezer S. Yudkowsky" wrote:
>

> If, under the True Ultimate Laws of Physics and the Final Ultimate
> Technology, offensive technology overpowers defensive technology, then a
> community of sentients killing and eating each other is definitely a bad
> thing. All the entities that started out as human will die off sooner or
> later, even the transhuman AIs will be preoccupied with survival, and the
> future will be a fairly ugly place.

This is a strawman and presents a false dichotomy. Offense and defense
are only operable in an environment of continued competition where the
entities involved do not find it more productive to compete on grounds
other than physical combat with its attendant high cost for all
involved.

> Actually, the main point is still
> pretty horrifying even if the humans wind up barricaded behind a handful
> of Friendly AIs, or if the uploads are forever limited to whatever chunks
> of matter they grabbed before the Cambrian explosion, or if some
> transhuman who started as a bondage fetishist grabs a handful of
> unfortunate human slaves on the way to ascension, or, for that matter, if
> everyone who stays behind on Earth gets wiped out by some rogue. So, yes,
> it's horrifying.
>

So is the idea of a super-SI that rules everything else on the basis of
a more or less hardwired and presumably immutable Friendliness
supergoal. Said SI will be most unfriendly to those entities that do
not agree with its notion of what friendliness entails. AFAIK it will
not necessarily be "friendly" to those entities that simply wish to
strike out on their own outisde of its influence. There are
implications here that can look pretty scary.

> Alternatively, you can have the entire solar system wiped out at one blow
> if an unFriendly AI undergoes a hard takeoff before there are any human
> uploads around. For that matter, you can have the entire solar system
> wiped out if an unFriendly uploaded human undergoes a hard takeoff. That
> would be pretty horrifying too.
>

You can have the entire solar system wiped out if the SI decides that
maximal friendliness is to scan everyone destructively and convert the
Solar System to an SI where these troublesome beings can work out their
grievances and problems without ultimately harming themselves.

>
> It can happen gradually - it can't *creep up*. The community would
> *notice*. In fact, they would read the SL4 archives and anticipate all of
> it in advance. As soon as the changes became noticeable, and *before*
> they became critical - *pow*.

*Pow*? *Pow* does not sound friendly. :-( Exactly what community is
this if the SI is the cosmic sysop and its goals go into creep mode
after it reaches that status? There will be nothing, by your design,
that can circumvent or gainsay it.

>
> > > AIs who are dreadfully panicked
> > > about the prospect of drifting away from Friendship because Friendship is
> > > the only important thing in the world to them...
> >
> > aha! Caught you!
> >
> > Now you're proposing to make AI's neurotic and mentally unhealthy... to make
> > them fear becoming unfriendly
>
> Okay, fine, I shouldn't have used the words "dreadfully panicked" or "only
> important thing in the world". It's sickeningly anthropomorphic and I was
> only doing it to convey the picture of an all-out community effort.
>
> But I am *not* proposing to make AIs neurotic. "Friendliness is the only
> important thing in the world", or rather, "Friendliness is my goal
> system", is a perfectly healthy state of mind for an AI. And taking
> massive preventative action if Friendliness is threatened doesn't require
> a programmer assist; it's a natural consequence of the enormous unFriendly
> consequences of an AI community drifting away from Friendliness. I would
> *never* "make" an AI fear anything; at most, I would ask politely.
>

I don't see this as a healthy state. It is an unexamined primary
belief. No, much stronger than that. It is the wired in basis of
everything else. Either it is forbidden and/or impossible to examine it
(by definition unhealthy) or it can be examined and found wanting. That
friendliness is desirable does not mean having it as an absolute is
either tenable or healthy.

- samantha



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT