Re: friendly ai

From: Eliezer S. Yudkowsky (
Date: Sun Jan 28 2001 - 14:03:05 MST

Ben Goertzel wrote:
> > I can't visualize an AI incapable of learning making it out of the lab or
> > even walking across the room, much less doing one darn thing towards
> > bringing citizenship rights to the Solar System.
> No, I was unclear. Honest Annie was an AI that got so smart that it just
> shut up and stopped communicating with people altogether.... Radio silence.

I was just WHAT-ing your previous sentence, to the effect that you could
visualize a Friendly AI with no learning subgoal being as effective as a
Friendly AI with a learning subgoal.

> > Remember, the hypothesis is that Friendliness is the top layer of a good
> > design, and discovering and creation the subgoals; if you postulate an AI
> > that violates this rule and see horrifying consequences, it should
> > probably be taken as an argument in favor of _Friendly AI_. <grin>
> I don't think I'm envisioning horrifying consequences at all. AI's getting
> bored
> with humans isn't all that horrifying, is it? Especially if humans are all
> uploading
> themselves... then, most humans are going to be bored with old-style flesher
> humans
> too...

Is it horrifying? That depends on two things; first, the balance between
defensive and offensive technology; second, the hard takeoff scenario.

If, under the True Ultimate Laws of Physics and the Final Ultimate
Technology, offensive technology overpowers defensive technology, then a
community of sentients killing and eating each other is definitely a bad
thing. All the entities that started out as human will die off sooner or
later, even the transhuman AIs will be preoccupied with survival, and the
future will be a fairly ugly place. Actually, the main point is still
pretty horrifying even if the humans wind up barricaded behind a handful
of Friendly AIs, or if the uploads are forever limited to whatever chunks
of matter they grabbed before the Cambrian explosion, or if some
transhuman who started as a bondage fetishist grabs a handful of
unfortunate human slaves on the way to ascension, or, for that matter, if
everyone who stays behind on Earth gets wiped out by some rogue. So, yes,
it's horrifying.

Alternatively, you can have the entire solar system wiped out at one blow
if an unFriendly AI undergoes a hard takeoff before there are any human
uploads around. For that matter, you can have the entire solar system
wiped out if an unFriendly uploaded human undergoes a hard takeoff. That
would be pretty horrifying too.

> > Speaking as one of the six billion people who gets toasted if you make one
> > little mistake, would you *please* consider adding Friendliness to that
> > list? I really don't think it will cost you anything.
> Friendliness is indeed one of webmind's goals ;>

Webmind or Webmind Inc.?

> > I've evolved from subgoal-driven to supergoal-driven over time. I can see
> > this as possible, but I really can't see it as inevitable, not if the AI
> > is on guard and doesn't want it to happen. Evolution has to happen in
> > steps, and steps can be observed, detected, and counteracted. A failure
> > of Friendliness in a seed AI vanishes as soon as the AI realizes it's a
> > failure; it takes a catastrophic failure of Friendliness, something that
> > makes the AI stop *wanting* to be Friendly, before errors can build up in
> > the system.
> I don't know. Can't "stopping wanting to be friendly" creep up gradually
> too?

It can happen gradually - it can't *creep up*. The community would
*notice*. In fact, they would read the SL4 archives and anticipate all of
it in advance. As soon as the changes became noticeable, and *before*
they became critical - *pow*.

> > AIs who are dreadfully panicked
> > about the prospect of drifting away from Friendship because Friendship is
> > the only important thing in the world to them...
> aha! Caught you!
> Now you're proposing to make AI's neurotic and mentally unhealthy... to make
> them fear becoming unfriendly

Okay, fine, I shouldn't have used the words "dreadfully panicked" or "only
important thing in the world". It's sickeningly anthropomorphic and I was
only doing it to convey the picture of an all-out community effort.

But I am *not* proposing to make AIs neurotic. "Friendliness is the only
important thing in the world", or rather, "Friendliness is my goal
system", is a perfectly healthy state of mind for an AI. And taking
massive preventative action if Friendliness is threatened doesn't require
a programmer assist; it's a natural consequence of the enormous unFriendly
consequences of an AI community drifting away from Friendliness. I would
*never* "make" an AI fear anything; at most, I would ask politely.

> But isn't this a recipe for backlash of some sort?? Fear breeds aggression,
> no?

No, that's just humans.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT