Re: friendly ai

From: Eliezer S. Yudkowsky (
Date: Sun Jan 28 2001 - 13:29:00 MST

Ben Goertzel wrote:
> > Do you seriously think that a Friendly AI which totally lacked the
> > behaviors and cognitive complexity associated with learning would be more
> > effective in making Friendliness real?
> Quite possibly, YES
> This is the "Honest Annie" scenario envisioned by Stanislaw Lem

Didn't read that story, sorry... but...


I can't visualize an AI incapable of learning making it out of the lab or
even walking across the room, much less doing one darn thing towards
bringing citizenship rights to the Solar System.

> The possibility is that an AI, interested in discovering and creative new
> things,
> rapidly evolves to the point where humans and their various dilemmas,
> puzzles and problems
> are not very intriguing to it

Okay. Here, again, you seem to be assuming bad design and pointing out
the awful consequences. Consider the counterpart hypothesis: That an AI,
interested in humans and their various dilemnas, rapidly evolves to the
point where puzzles and problems are not very intriguing to ver.

Remember, the hypothesis is that Friendliness is the top layer of a good
design, and discovering and creation the subgoals; if you postulate an AI
that violates this rule and see horrifying consequences, it should
probably be taken as an argument in favor of _Friendly AI_. <grin>

> > Ergo, the behaviors associated with learning are valid subgoals of
> > Friendliness.
> They are indeed valid subgoals of friendliness
> However, the weight that they would be assigned as subgoals of friendliness
> might not be very high

If the system isn't smart enough to see the massive importance of
learning, use a programmer intervention to add the fact to the system that
"Ben Goertzel says learning is massively important". If the system
assumes that "Ben Goertzel says X" translates to "as a default, X has a
high probability of being true", and a prehuman AI should make this
assumption (probably due to another programmer intervention), then this
should raise the weight of the learning subgoal.

> (In constructing Webmind's goal system, I suspect we're assigning a higher
> weight to learning &
> creativity than would be necessary if they were considered only as subgoals
> of friendliness -- because
> I'm interested in evolving the smartest, most knowledgeable Ai system
> possible)

Speaking as one of the six billion people who gets toasted if you make one
little mistake, would you *please* consider adding Friendliness to that
list? I really don't think it will cost you anything.

> And, they're very strong candidates for long-term, self-organizing,
> spontaneous subgoal alienation...

I've evolved from subgoal-driven to supergoal-driven over time. I can see
this as possible, but I really can't see it as inevitable, not if the AI
is on guard and doesn't want it to happen. Evolution has to happen in
steps, and steps can be observed, detected, and counteracted. A failure
of Friendliness in a seed AI vanishes as soon as the AI realizes it's a
failure; it takes a catastrophic failure of Friendliness, something that
makes the AI stop *wanting* to be Friendly, before errors can build up in
the system.

If there's a society of Friendly AIs, they'll *notice* that new AIs are a
little bit less Friendly than the originals, and *all* of them, new and
old alike, will go into a screaming panic and start doing something about
it. It takes *time* for evolution to make big changes. Evolution has to
cause a small failure of Friendliness before it can cause a big,
catastrophic failure of Friendliness, and that will put *everyone* on
guard, new AIs and old AIs alike, because they haven't yet undergone
catastrophic failure and they still *want* to be Friendly. And I don't
believe that evolution can cause catastrophic blatant "Friendship drift"
if it's evolution operating on an entire community of seed AIs who are on
their guard, who can examine their own code and make communal agreements
to impose artificial selection pressures, AIs who are dreadfully panicked
about the prospect of drifting away from Friendship because Friendship is
the only important thing in the world to them...

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT