From: Ben Goertzel (email@example.com)
Date: Fri Mar 08 2002 - 23:57:28 MST
Eliezer asked me to write something on Novamente's goal system. I still
haven't done that, because it seems so hard to write about this part of the
system in isolation, in a comprehensible way.
However, while editing the "Psyche" chapter in our in-process Novamente book
(and doing exciting stuff like making the notation consistent with other
chapters ;-p), I indulged myself and added a brief section on Friendly AI.
Here is the first draft of this section (not at all polished; I just typed
it in moments ago...).
There is a little Novamente terminology here that isn't defined, but the
basic gist should be evident.
In this section we will briefly explore some of the more futuristic aspects
of Novamente’s Feelings and Goals component. These aspects of Novamente are
always important, but in the future when a Novamente system intellectually
outpaces its human mentors, they will become yet more critical.
The MaximizeFriendliness GoalNode is very close to the concept of
“Friendliness,” which Eliezer Yudkowsky has discussed extensively in his
treatise Creating a Friendly AI (Yudkowsky, 2001:
http://singinst.org/CFAI/). Yudkowsky believes that an AI should be
designed with an hierarchical goal system that has Friendliness at the top.
In this scheme, the AI pursues other goals only to the extent that it
believes (through experience or instruction) that these other goals are
helpful for achieving its over-goal of Friendliness.
Yudkowsky’s motivation for this proposed design is long-term thinking about
the possible properties of a progressively self-modifying AI with superhuman
intelligence. His worry (a very reasonable one, from a big-picture
perspective) is that one day an AI will transform itself to be so
intelligent that it cannot be controlled by humans – and at this point, it
will be important that the AI values Friendliness. Of course, if an AI is
self-modifying itself into greater and greater levels of intelligence, there
’s no guarantee that Friendliness will be preserved through these successive
self-modifications. His argument, however, is that if Friendliness is the
chief goal, then self-modifications will be done with the goal of increasing
Friendliness, and hence will be highly likely to be Friendly.
Unlike the hypothetical Friendly AI systems that Yudkowsky has discussed,
Novamente does not have an intrinsically hierarchical goal system. However,
the basic effect that Yudkowsky describes – MaximizeFriendliness supervening
over other goals -- can be achieved within Novamente’s goal system through
appropriate parameter settings. Basically all one has to do is
* constantly pump activation to the MaximizeFriendliness GoalNode.
* encourage the formation of links of the form "InheritanceLink G
MaximizeFriendliness", where G is another GoalNode
This will cause it to seek Friendliness maximization avidly, and will also
cause it to build an approximation of Yudkowsky’s posited hierarchical goal
system, by making the system continually seek to represent other goals as
subgoals (goals inheriting from) MaximizeFriendliness.
However, even if one enforces a Friendliness-centric goal system in this
way, it is not clear that the Friendliness-preserving evolutionary path that
Yudkowsky envisions will actually take place. There is a major weak point
to this argument, which has to do with the stability of the Friendliness
goal under self-modifications.
Suppose our AI modifies itself with the goal of maintaining Friendliness.
But suppose it makes a small error, and in its self-modificatory activity,
it actually makes itself a little less able to judge what is Friendly and
what isn’t. It’s almost inevitable that this kind of error will occur at
some point. The system will then modify itself again, the next time around,
with this less accurate assessment of the nature of Friendliness as its
goal. The question is, what is the chance that this kind of dynamic leads
to a decreasing amount of Friendliness, due to an increasingly erroneous
notion of Friendliness.
One may also put this argument slightly differently: without speaking of
error, what if the AI’s notion of Friendliness slowly drifts through
successful self-modifications? Yudkowsky’s intuition seems to be that when
an AI has become intelligent enough to self-modify in a sophisticated
goal-directed way, it will be sufficiently free of inference errors that its
notion of Friendliness won’t drift or degenerate. Our intuition is not so
clear on this point.
It might seem that one strategy to make Yudkowsky’s idea workable would be
give the system another specific goal, beyond simple Friendliness: the goal
of not ever letting its concept of Friendliness change substantially.
However, this would be very, very difficult to ensure, because every concept
in the mind is defined implicitly in terms of all the other concepts in the
mind. The pragmatic significance of a Friendliness FeelingNode is defined
in terms of a huge number of other nodes and links, and when a Novamente
significantly self-modifies it will change many of its nodes and links.
Even if the Friendliness FeelingNode always looks the same, its meaning
consists in its relations to other things in the mind, and these other
things may change. Keeping the full semantics of Friendliness invariant
through substantial self-modifications is probably not going to be possible,
even by an hypothetical superhumanly intelligent Novamente. Of course, this
cannot be known for sure since such a system may possess AI techniques
beyond our current imagination. But it’s also possible that, even if such
techniques are arrived at by an AI eventually, they may be arrived at well
after the AI’s notion of Friendliness has drifted from the initial
programmers’ notions of Friendliness.
The resolution of such issues requires a subtle understanding of Novamente
dynamics, which we are very far from having right now. However, based on
our current state of relative ignorance, it seems to us quite possible that
the only way to cause an evolving Novamente to maintain a humanly-desirable
notion of Friendliness maximization is for it to continually be involved
with Friendliness-reinforcing human interactions. Human minds tend to
maintain the same definitions of concepts as the other human minds with
which they frequently interact: this is a key aspect of culture. To the
extent that an advanced Novamente system is part of a community of Friendly
humans, it is more likely to maintain a human-like notion of Friendliness.
But of course, this is not a demonstrable panacea for Friendly AI either.
This archive was generated by hypermail 2.1.5 : Sat May 18 2013 - 04:00:25 MDT