RE: Military Friendly AI

From: Ben Goertzel (
Date: Thu Jun 27 2002 - 21:50:59 MDT


> I'll have to disagree, and say that many of the ideas in CFAI are AFAIK
> novel. Give him a little credit for inventing more than just the term. And
> as for all the thinking you and others did, I don't see that it produced
> much in the way of results during the period before CFAI was published.
> Where's all the prior art, and why aren't we debating that stuff here, and
> comparing it to CFAI to see which is better?

The "prior art" (if you can apply that term to speculative philosophy; it's
normally used for concrete technology) is scattered all over the place, in
writings that were focused mainly on other things and mentioned Friendly AI
topics only peripherally. Digging it all up would be a major task.
Personally, I can recall some books where such things were discussed, but
not the page numbers etc.

One pretty old reference where such ideas were discussed was Val Turchin's
book The Phenomenon of Science, published in Russia in the late 60's, and in
the US in the early 70's. He coined the nice term "human plankton" to refer
to how insignificant we would appear from the future perspective of
superintelligent machines with synergetic group minds. His philosophy of
what we call the Singularity centered around the notion of the "Metasystem
transition" wherein what used to be autonomous become components in a
higher-level system, and the nexus of control passes to a higher level...

> >I think we will be able to develop a real
> > theory of Friendly AI only after some experience playing around with
> > infrahuman AGI's that have a lot more general intelligence than
> any program
> > now existing.
> Which tends to strike me as a dangerous approach.
> >
> > I believe my attitude toward Friendliness is typical of AGI researchers.
> Unfortunately, yes not many people seem to as careful as we all would like
> when playing around with existential technologies.

There are dangers involved of course.

However, it may be even MORE dangerous to fool oneself into believing one
has adequately grappled with the Friendliness issue prior to creating an
infrahuman AGI.

Perhaps it is less dangerous to be honest with oneself about the fact that,
prior to having some serious infrahuman AGI's to experiment with, it's just
possible to make a serious theory of Friendly AI...

> So can your position be summarized as: we'll build our AI, get it
> working at
> some subhuman level, and then when we guess it needs it we'll stop running
> it for a while until we figure out how to ensure "Friendliness"? I think
> your protocol needs to be fleshed out for us further so we can feel more
> comfortable with your plans.

Whether it will be necessary to stop running it for a while, or not, will
depend on the situation.

If the intelligence in the system is growing very fast, then yes, this will
be necessary. If the intelligence is growing slowly, then to stop it may
not be necessary.

I should add that in Novamente we *do* have a Friendly goal system, it's
just a different kind than Eli's. Novamente has a number of goals, a goal
heterarchy rather than a goal hierarchy. Friendliness is an important one
of its goals, but it's not wired in as the supergoal. Then Friendly
behavior is intended to be taught to the system via interaction in a shared
perceptual environment, just as CFAI suggests.

So, the big difference between Novamente's Friendly goal system and the one
Eliezer proposes, is that he suggests a hierarchical goal system with
Friendliness at the top is best, whereas I suggest a heterarchical goal
system with Friendliness as one element of the heterarchy, is the only thing
compatible with the natural dynamics of intelligence.

However, I am not *sure* this is right, and I'd like to experiment with both
heterarchical and hierarchical goal systems at an appropriate point in time.

It is possible to implement an Eliezer-style Friendly goal hierarchy with
Friendliness at the top, inside Novamente. My conjecture is that this goal
system will be less stable over time than a heterarchical goal system with
Friendliness as one element among many. However, this conjecture could be
wrong -- I don't have a good enough theory of Novamente dynamics to tell
right now. This is the sort of question I think has to be discovered
through experiment, not through proud theoretical declamations....

> It sounds dangerous to me (and I guess others here) to build the AI first,
> and let it run for some time without any special F features built in. How
> will your protocol ensure that it does not take off, and if it does how
> are we ensured it will turn out ok?

Here's the thing... as clarified in the previous paragraphs I just typed, we
*do* have a Friendliness goal built in, we're just sure yet what the best
way is to do this. And we're not willing to fool ourselves that we *are*
sure what the best way is....

Compared to e.g. Peter Voss's A2I2 system, our approach is far closer to
Eli's, because Peter's system is neural-nettish and is not the sort of
system that one *can* explicitly supply with a Friendliness goal. But Peter
holds far more strongly than me to the opinion that it's "too early" to
seriously consider the Friendliness issue. He's just less argumentative
than me so he's keeping relatively quiet about this view on this list,
although he's a member ;_)

As you and Eli and I discussed in a private e-mail, we do plan to put a
"failsafe" mechanism into Novamente to halt a potential unsupervised hard
takeoff -- eventually, when we consider there to be a significantly > 0 risk
of this happening.

The task of detecting a rapid intelligence increase within Novamente is not
an easy one, and creating and tuning useful measures of intelligence
increase is going to be a big research topic, which we will have to explore
through systematic experimentation *once we have a fully implemented system
with a moderate degree of intelligence*. In the Novamente manuscript (the
new version) there is a whole section on intelligence measurement, and it's
not an easy topic. Measuring the intelligence of an AGI effectively
requires a measurement system of some intelligence in itself, because
intelligence is a fairly multidimensional concept when you really delve
into it.

So, I think we don't even know how to build a good failsafe mechanism for
Novamente or any other AI yet. We will only know that when we know how to
measure the intelligence of an AGI effectively, and we will only know *this*
based on experimentation with AGI's smarter than the ones we have now.

yeah, you can create a standard test of puzzles for the system to solve, but
I have little faith that these things measure intelligence well. There are
so many different kinds of intelligence... and then how do you measure how
hard the system is actually trying to solve the puzzle, as opposed to
thinking about other things of interest to it? These are not impossible
problems, but they're not trivial, and it's hard to solve them definitively
in the absence of a working AGI.

> So at exactly what stages of the development do you plan to
> implement which
> F features? You have some sort of protocol, right?

Once the entire system is implemented, and we have begun to teach it by
chatting with it through our Experiential Interactive Learning interface,
and exposing it to external data in a "free access" manner rather than
simply by feeding it test datasets.

> If your design is incapable of supporting such features, and you have been
> unable to come up with your own seemingly impregnable way to keep your AI
> "Friendly" throughout its development into superintelligence, then maybe
> we should be getting worried?

Actually, Novamente is capable of supporting a CFAI style friendly goal
system *initially*, I just doubt that such a goal system will be stable thru
long-term Novamente dynamics.... I think that a heterarchical friendly goal
system will be more stable within novamente... but this is just conjecture
right now...

> I assume that if you get your working infrahuman AI, and are unable to
> come up with a bulletproof way of keeping it "Friendly", you will turn it
> off?

Not necessarily, this will be a hard decision if it comes to that.

It may be that what we learn is that there is NO bulletproof way to make an
AGI Friendly... just like there is no bulletproof way to make a human
Friendly.... It is possible that the wisest course is to go ahead and let
an AGI evolve even though one knows one is not 100% guaranteed of
Friendliness. This would be a tough decision to come to, but not an
impossible one, in my view.

> Well do you think it's worth our trouble to read it? If so I'd like to see
> some discussion about it (perhaps Eliezer will allow you to
> repost the flaws
> he saw in it) since I don't recall any threads regarding it (if I've
> forgotten, someone please give me a URL to the archives, thanks).

I think it's worth your while to read it, sure. And there was a brief
thread on it a while back.

> I just started reading your AI Morality paper, I'm sure I'll have more
> comments later, but this part is a bit scary I guess to everyone here who
> is afraid of the initial AI programmers having too much control over the
> AI's final state:
> "But intuitively, I feel that an AGI with these values is going to be a
> positive force in the universe – where by “positive” I mean
> “in accordance
> with Ben Goertzel’s value system”."

There is no escaping the subjectivity of morality, Brian.

The reason I put that phrase in there is, I know that to *some* people,
anything that may lead to the obsolescence of humanity is intrinsically
negative. (Because they believe, e.g., that humans are God's chosen
creatures... that uploads will not have souls... etc.). To these people
even a Friendly AGI would be a negative force in the universe.

Eliezer's approach to Friendliness relies on his own personal morals as
well. his are pretty similar to mine; for instance, he thinks that
preserving lives forever is a good thing. My wife, who believes in
reincarnation, disagrees with me and Eli on this -- according to her moral
standards, ending death goes against the natural cycle of karma and is thus
probably not a good thing....

Perhaps I should refer to some standard "transhumanist moral code" rather
than my own personal code, but I didn't want to make up a moral code and
just define is at the standard "transhumanist moral code."

In fact though I strongly suspect that an AGI with the values I suggest in
that essay, would be a positive force according to the moral standards of
nearly all transhumanists...

-- Ben G

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT