Re: Novamente goal system

From: Eliezer S. Yudkowsky (
Date: Sun Mar 10 2002 - 08:56:07 MST

Ben Goertzel wrote:
> > Just to make sure
> > we're all on the same wavelength, would you care to describe briefly what
> > you think a transhuman Novamente would be like?
> This first (mildly) transhuman Novamente will communicate with us in
> comprehensible and fluent but not quite human-like English. It will be a
> hell of a math and CS whiz, able to predict financial and economic trends
> better than us, able to read scientific articles easily, and able to read
> human literary products but not always intuitively "getting" them. It will
> be very interested in intense interactions with human scientists on topics
> of its expertise and interest, and in collaboratively working with them to
> improve its own intelligence and solving their problems. It will be
> qualitatively smarter than us, in the same sense that you or I are
> qualitatively smarter than the average human -- but no so much smarter as to
> have no use for us (yet)....
> How long this phase will last, before mild transhumanity gives rise to
> full-on Singularity, I am certainly not sure.

Okay. I can't say I agree with the theory of takeoff dynamics, but at least
your semantics seem to show that you're talking about a mind capable of
independently planning long-term actions in the pursuit of real-world goals,
rather than a powerful tool.

Actually, as described, this is pretty much what I would consider
"human-equivalent AI", for which the term "transhumanity" is not really
appropriate. I don't think I'm halfway to transhumanity, so an AI twice as
many sigma from the mean is not all the way there. Maybe you should say
that Novamente-the-project is striving for human-equivalence; either that,
or define what you think a *really* transhuman Novamente would be like...

Incidentally, this thing of having the project and the AI having the same
name is darned inconvenient. Over on this end, the group is SIAI, the
architecture is GISAI, and the AI itself will be named Ai-something.

> > I claim: There is no important sense in which a cleanly causal,
> > Friendliness-topped goal system is inferior to any alternate system of
> > goals.
> >
> > I claim: The CFAI goal architecture is directly and immediately
> > superior to
> > the various widely differing formalisms that were described to me by
> > different parties, including Ben Goertzel, as being "Webmind's goal
> > architecture".
> >
> > I claim: That for any important Novamente behavior, I will be able to
> > describe how that behavior can be implemented under the CFAI architecture,
> > without significant loss of elegance or significant additional computing
> > power.
> These are indeed claims, but as far as I can tell they are not backed up by
> anything except your intuition.

Claims are meant to be tested. So far the claims have been tested on
several occasions; you (and, when I was at Webmind, a few other folk) named
various things that you didn't believe could possibly fit into the CFAI goal
system, such as "curiosity", and I explained how curiosity indeed fit into
the architecture. That was an example of passing a test for the first claim
and third claim. If you have anything else that you believe cannot be
implemented under CFAI, you can name it and thereby test the claim again.

Similarly, I've explained how a goal system based on predicted-to versus
associated-with seeks out a deeper class of regularities in reality, roughly
the "useful" regularities rather than the "surface" regularities. Two
specific examples of this are: (1) CFAI will distinguish between genuine
causal relations and implication-plus-temporal-precedence, since only the
former can be used to manipulate reality; Judea Pearl would call this
"severance", while I would call this "testing for hidden third causes". I
don't know if Novamente is doing this now, but AFAICT Webmind's
documentation on the causality/goal architecture didn't show any way to
distinguish between the two. (2) CFAI will distinguish contextual
information that affects whether something is desirable; specifically,
because of the prediction formalism, it will seek out factors that tend to
interfere with or enable A leading to B, where B is the goal state. I again
don't know about Novamente, but most of the (varied) systems that were
described to me as being Webmind might be capable of distinguishing degrees
of association, but would not specialize on degrees of useful association.

To give a concrete example, Webmind (as I understood it) would, on seeing
that rooster calls preceded sunrise, where sunrise is desirable, would begin
to like rooster calls and would start playing them over the speakers. CFAI
would try playing a rooster call, notice that it didn't work, and
hypothesize that there was a common third cause for sunrise and rooster
calls which temporally preceded both (this is actually correct; dawn, I
assume, is the cause of rooster calls, and Earth's rotation is the common
cause of dawn and sunrise); after this rooster calls would cease to be
desirable since they were no longer predicted to lead to sunrise. Maybe
Webmind can be hacked to do this the right way; given the social process
that developed Webmind, it certainly wouldn't be surprising to find that at
least one of the researchers thought up this particular trick. My point is
that in CFAI the Right Thing is directly, naturally, and elegantly emergent,
where it goes along with other things as well, such as the Right Thing for
positive and negative reinforcement, as summarized in "Features of Friendly

So that's what I would offer as a demonstration of the second claim.

> I am certainly not one to discount the value of intuition. The claim that
> Novamente will suffice for a seed AI is largely based on the intuition of
> myself and my collaborators.
> However, my intuition happens to differ from yours, as regards the ultimate
> superiority of your CFAI goal architecture.

Well, to the extent that you understand thinking better than average, you
should be better than average at verbalizing your intuitions and the causes
behind your intuitions.

> I am not at all sure there is *any* goal architecture that is "ultimate and
> superior" in the sense that you are claiming for yours.

Well, if it's superior or equal to all other goal systems that are proposed
up until the point of Singularity, including the human one, that's good
enough for me.

> And I say this with what I think is a fairly decent understanding of the
> CFAI goal architecture. I've read what you've written about it, talked to
> you about it, and thought about it a bit. I've also read, talked about, and
> thought about your views on closely related issues such as causality.

Well, it's an odd thing, and not what I would have expected, but I've
noticed that for these detailed, complex issues - which one would think
would be best discussed in an extended technical paper - there is no good
substitute for realtime interaction. When I was at Webmind I could
generally convince someone of how cleanly causal, Friendliness-topped goal
systems would work, as long as I could interact with them in person. Of
course I never got as far as any of the goal-system-stabilizing stuff.

> > No, this is what we humans call "rationalization". An AI that seeks to
> > rationalize all goals as being Friendly is not an AI that tries to invent
> > Friendly goals and avoid unFriendly ones.
> In the most common case: a system really is pursuing goal G1, and chooses
> action A because it judges A will lead to satisfaction of G1. But it thinks
> it *should* be pursuing goal G2 instead. So it makes up reasons why A will
> lead to satisfaction of G2. Usually the term "rationalization" is used when
> these reasons are fairly specious.
> What I am talking about is quite different from this. I am talking about:
> --> Taking a system that in principle has the potential for a goal
> architecture (a graph of connections between GoalNodes) with arbitrary
> connectivity
> --> Encouraging this system to create a goal architecture that has a
> hierarchical graph structure with Friendliness at the top

Actually it's not a hierarchy, but a directed acyclic network, just like the
network for "is predicted to lead to X", for any X where the separate
instances of X are considered as distinct space-time events. (So that X1
can lead to X2 without the overall network being cyclic.)

Anyway, the property of a "'hierarchical' graph structure with Friendliness
at the top" is a necessary but not a sufficient condition for a Friendliness
architecture. The way in which that graph structure arises is also
relevant. The links need to be predictive ones, not fully general
associative ones. Similarly, the class of subnodes that leads to
Friendliness has to be produced by a method whose overall "shape" is
optimized to produce a class of subnodes whose "shape" is those things and
all those things that are thought to lead to Friendliness under bounded
rationality at a given time; if you take a small subset of these nodes,
produced by some method with an aFriendly shape, then that still isn't the
shape of a true Friendly AI.

Heh. Now *I* want to start tossing equations around. Anyway, what I want
to say is that the class of nodes that can be linked to Friendliness is not
the class of nodes that are predictively linked to Friendliness; and a small
subset of the class of nodes that are predictively linked to Friendliness
may have a very different "shape" from the entire class. If you violate
both criteria simultaneously, then I can see a small subset of the nodes
linkable to Friendliness by generic association as being very unFriendly

> What I am saying is that Novamente's flexible goal architecture can be
> *nudged* into the hierarchical goal architecture that you propose, but
> without making a rigid requirement that the hierarchical goal structure be
> the only possible one.

Right; what I'm saying is that nudging Novamente's architecture into
hierarchicality is one thing, and nudging it into Friendliness is quite
another. Incidentally, if you think that the CFAI architecture is more
"rigid" than Novamente in any real sense, please name the resulting
disadvantage and I will explain how it doesn't work that way.

> I believe that if the system builds the hierarchical goal structure itself,
> then this hierarchical goal structure will coevolve with the rest of the
> mind, and will be cognitively natural and highly functional. I don't think
> that imposing a fixed hierarchical goal structure and rigidly forcing the
> rest of the mind to adapt to it (the essence of the CFAI proposal, though
> you would word it differently), will have equally successful consequences.

I rather hope that, after seeing how various advantages arise and how
various disadvantages fail to arise under CFAI, you will come to see that
the CFAI architecture is the natural description of a goal system. CFAI is
a help, not a hindrance; it contributes usefully to the intelligence of the
system. A directed acyclic network is not *forced* upon the goal system; it
is the natural shape of the goal system, and violating this shape results in
distortion of what we would see as the natural/normative behavior.

> I did not say this and do not agree with this. My statement was rather that
> maintaining a concept of Friendliness close to the human concept of
> Friendliness *may* require continual intense interaction with humans. This
> says nothing about reward or punishment, which are very simplistic and
> limited modes of interaction anyway.

I think that having a seed AI will require continual intense interaction
with humans during the early days, until the AI metaphorically learns how to
ride that bike. "But, Daddy, why is it dangerous to clear bit five
throughout memory?", etc. With respect to Friendliness, I naturally take an
interest in making sure that the threshold for working-okayness is set as
low as possible, even if it works better with more human interaction. What
is the system property that requires continual intense interaction to
enforce, and how does the continual intense interaction enforce it? Or
alternatively, what is it that requires continual intense informational
inputs from humans in order to work right?

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:37 MDT