RE: Goertzel's _PtS_

From: Ben Goertzel (
Date: Thu May 03 2001 - 07:27:51 MDT

> >Yes, that's what I described, but by that description *I'm* hard-wired
> >Friendly, since this is one of the properties I strive for in my own
> >declarative philosophical content.
> Actually, I don't think so. *You* have decided to strive for
> friendliness
> in your own life. Your mother & father didn't force it to be a primary
> goal (although they may have tried, you actually had a choice). Thus you
> are not "hard-wired" friendly in the same way that a FAI would be.

I guess Eliezer's point may be that the AI ~does~ have a choice in his
plan -- the Friendliness supergoal is not an absolute irrevocable goal, it's
just a fact ("Friendliness is the most important goal") that is given an
EXTREMELY high confidence so that the system has to gain a HUGE AMOUNT of
evidence to overturn it.

Thus, the goal isn't "hardwired", it's just "brainwashed into the system
from birth with so much confidence that for a long while it seems like it's

My contention is that a baby mind that thinks "Friendless is the most
important goal" with such confidence is going to have a hard time learning
how to behave in different contexts.

Sure, you can build in high-confidence implication links of the form
"keeping free memory adequate implies Friendliness" (because system crashes
aren't nice for people), "learning new things implies Friendliness" (because
people like to hear about new things the system has learned), etc. This is
what I think of as "hard wiring a goal hierarchy", although Eliezer will say
it's not hard-wiring because the system eventually can override these links.

But how does the system then learn, through its experience, that, "When
working with X, learning new things implies Friendliness", but "when working
with person Y, learning new things only insignificantly implies
friendliness" ? These relationships, learned through experience, will have
VASTLY lower confidence than the human-provided high-confidence ones.

In short, this probabilistically almost-hard-wired goal hierarchy that
Eliezer seems to be proposing will drastically interfere with the process of
learning a richly textured hierarchy of context-dependent goals and
subgoals. But really useful Friendliness comes out of this richly textured
hierarchy that's learned through experience...

In the above paragraphs I'm sort of assuming a webmind-ish system in which
each relationship in the system is marked with at least 3 values, a strength
(how true is the relationship), a confidence (in the strength value), and an
importance (how useful is the relationship to the system), all valued in
[0,1]). I'm assuming a Webmind-ish learning framework in which general and
specific relationships are invoked together, at once, in response to
situations.... It's possible that Eliezer has an alternate reasoning and
control framework in which his semi-hard-wired goal hierarchy doesn't cause
problems of the sort that I'm seeing. However, in his writings he has not
articulated such a thing, yet. My own suspicion is: Sure, one can make a
reasoning system in which semi-hard-wired goal system doesn't interfere with
context-dependent learning about goal interbalancing ... but this reasoning
system may be so rigid as not to be capable of very much learning at all!


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT