Re: [agi] Two draft papers: AI and existential risk; heuristics and biases

From: Ben Goertzel (
Date: Wed Jun 07 2006 - 01:24:03 MDT

Hi Eli,

> First, as discussed in the chapter, there's a major context change
> between the AI's prehuman stage and the AI's posthuman stage. I can
> think of *many* failure modes such that the AI appears to behave well as
> a prehuman, then wipes out humanity as a posthuman. What I fear from
> this scientific-sounding business of "experimental investigation" is
> that the results of your investigation will be observed good behavior,
> and you will conclude that the AI "is good" and will stay good under
> extreme context changes. This is not, in fact, a licensable conclusion.

You are making incorrect assumptions about the goals of the
experimentation that I want to do.

Of course, just because a young, prehuman or young-child-human AI
system acts like a good puppy in its simulation world, doesn't mean it
will continue to act like a good puppy once it becomes vastly smarter
and vastly different...

However, right now I have very little understanding of the dynamics of
AI goal systems under conditions where the AI can modify its own goal
hierarchy based on its experience (not revising its supergoal, but
certainly revising its *understanding* of its supergoal as its own
general understanding advances). I feel that this kind of
understanding is necessary in order to do a detailed design of a
Friendly AI or of any likely-to-be-successful strongly-self-modifying
AI. I could attempt to achieve this kind of understanding through
pure theory right now, but I have not yet succeeded at doing so, and I
have a feeling that this kind of theory will be easier to come by once
there are more empirical examples of this kind of dynamics to look at.

You may conjecture that this kind of dynamics will be totally
different in pre-adult-human-level than in adult-human-level AGIs (to
use a very crude classification system). I admit this is possible,
but I am not at all sure that you're right. If you are right then I
will discover this via observing that the goal system revision
dynamics in a pre-adult-human-level Novamente are too simplistic to be
useful in guiding advanced theory on this topic.

Anyway, "goal system dynamics" is just one example (albeit a very
important one) of an AGI-theoretic topic that I believe will be
explorable more tractably once more understanding of the topic is
obtained via experimenting with adult-human-level AGIs.

So, I don't want to be misinterpreted.... It is not the case that I
think we can understand general AGI ethics by playing with ethics in
toddler-level AGIs in a simulation world. I agree that we should have
a reasonably solid theoretical understanding of AGI before launching
an AGI-driven Singularity -- unless the world is in a really dire
situation at that point and all the other alternatives seem even
riskier. Rather, it is the case that I think we can better formulate
theories about AGI after having some experience with toddler and
young-child level AIs and studying their internal dynamics under
various conditions. These theories will then guide us on the path to
creating more advanced AIs in a responsible and effective way.

You want to come up with the theory before building even the toddlers.
 I think coming up with the theory will be both easier and more
effective in the context of playing with the AI toddlers. Of course
there will be a bunch of nontrivial extrapolation in using
observations of AI toddlers to formulate theories about AI adults ----
but still, this is not as hard as extrapolating from the current level
of knowledge (empirical knowledge about humans plus pure math, with no
empirical knowledge about AGIs at all) to formulate theories about AI

> Second, there's an *enormous* amount of experimentation and observation
> that's already been done in the cognitive sciences. I feed off this
> body of pre-existing work in a dozen fields, and it gives me more
> concentrated evidence than I could assemble for myself in a hundred
> lifetimes. And all that I have studied is not the thousandth part of
> the whole. But where the processor is inefficient, no amount of
> evidence may suffice. If there's already a thousand times as much
> evidence as you could review in your lifetime, what makes you think that
> what's needed is one more experiment - rather than an insight that we
> already have more than enough evidence to see, but we aren't looking at
> the right way?

The data we now have from cognitive science is only rather indirectly
pertinent to the detailed design of nonhumanlike AI systems.

The kind of evidence I want is the kind that can only be gotten by
studying the *internal dynamics and structures* inside an AI system as
it carries out various sorts of tasks and behaviors.

For example: Under what conditions will an AGI implicitly reinterpret
its supergoal so dramatically that in effect it is no longer the same
supergoal as it started out with? What kinds of goal systems are in
practice "attractors" so that nearby goal systems will tend to drift
into them via natural cognitive dynamics upon interaction with the
world ... how big are the basins of attraction?

I can't explore questions like this with human minds because I can't
launch an ensemble of humans with slightly different goal systems and
let them learn and adapt and then see where each one ends up. And I
can't explore questions like this using current mathematics because I
just wind up with a bunch of equations that no one has any clue how to
solve. Yet in my view this is the kind of question that needs to be
investigated if we are to really understand AGI or FAI.

> Of course this objection has a special poignancy for me because, as far
> as I can tell, yes, we already have all the evidence we need, far more
> than enough, and the only problem is understanding the implications of
> what we already know. Pity that humans aren't logically omniscient.

This might be true, but even if so, it's irrelevant. We have to work
around the limitations of our cognitive systems, and sometimes
gathering more information than is logically necessary is the best way
to do that.

> But just which experiments do you propose to perform, and what do you
> expect them to tell you?

There is a long list of experiments I would like to perform, and
describing them all in a comprehensible way would take many dozens of
pages and would require the reader to have detailed background
knowledge about Novamente. The above-loosely-described ones on goal
systems have been thought through in much more detail than was
described above, and these constitute only one small example of the
types of experiments worth performing....

> Now in practice, I admit that there have been cases where the
> experimental observations told us which hypotheses we needed to test;
> nearly all revolutionary science, as opposed to routine science, happens
> this way. But it is also true in practice that you have to know what
> you're seeing. When it comes to interpreting what the behavior of an
> AGI can tell us about its internal workings, I think you may need to
> solve most of the problem in order to know what you're seeing.

I disagree with the latter sentence. I believe that in the case of a
Novamente system, it will be quite possible for us to draw connections
between the system's behaviors and its internal workings. Remember
that Novamente is designed with this in mind. One of the design
principles of the system has been that the system's internal
structures and dynamics should be as transparent to us as possible.
Drawing this sort of connection with regard to human brains behaviors
and internal workings may be much more difficult as brains were not
designed with this kind of transparency in mind.

> Third, last time I checked, you were still attempting to come up with
> reasons why the AI you planned to experiment with could not possibly
> undergo hard takeoff, rather than building in controlled ascent /
> emergency shutdown features in at every step as a simple matter of
> course.

No, we are not attempting to come up with reasons why the AI we plan
to experiment with cannot possibly undergo hard takeoff. We have
engineered our system in such a way that we know that a toddler-level
version of Novamente will not be able to undergo hard takeoff. No
"attempt to come up with reasons" is necessary or has been undertaken.

>I recall that you once said that the chance of the current
> version of Novamente undergoing an unexpected hard takeoff was "a
> million to one". If you've read the heuristics and biases chapter, you
> now know why a statement like this is sufficient to make me say to
> myself, "This is the way the world ends."

I read your chapter and some of the references in it, and I like your
chapter very much ... but I find your argument in the above paragraph
quite specious. You are IMO abusing the "heuristics and biases"
results in the above paragraph.

The chance of the current version of Novamente undergoing an
unexpected hard takeoff is EFFECTIVELY ZERO. It is not just a million
to one. It is effectively the same as the chance of my cellphone or
the wart on my uncle's nose or the instance of Firefox on my laptop
undergoing a hard takeoff.

In spite of the heuristics and biases results, I am still going to
maintain that the odds of my daughter transmogrifying into a stockpile
of radioactive waste tomorrow morning are effectively zero. I am not
going to refuse to sleep in the same house as her just because of a
fear that I am underestimating the risk that she will transmogrify
into a stockpile of radioactive waste....

> It's not thought, but action, that counts. I'd have a very different
> opinion of this verbal advice to "devise an experimental strategy for
> FAI" if you posted a webpage containing a list of which FAI-related
> experiments you wanted to do, what you thought you might learn from them
> that you couldn't read off of existing science, and which observations
> you felt would license you to make which conclusions about the rules of
> Friendly AI.

That webpage will not be posted, not because the experiments are
unknown, but because describing them in any detail would require that
the reader had detailed knowledge of the Novamente architecture, the
details of which are proprietary.

> But in terms of how you spend your work-hours, which code you write,
> your development plans, how you allocate your limited reading time to
> particular fields, then this business of "First experiment with AGI" has
> the fascinating and not-very-coincidental-looking property of having
> given rise to a plan that looks exactly like the plan one would pursue
> if Friendly AI were not, in fact, an issue.

At this stage, we are doing basically the same work as we would do if
Friendliness were not an issue.

However, once we reach the toddler-level stage, then the types of
experiments we do with the system will be affected by the fact that
Friendliness is an issue. And, the work we do to bring the system
beyond the toddler-level stage will be strongly affected by
Friendliness concerns.

> There are deeper theoretical
> reasons (I'm working on a paper about this) why you could not possibly
> expect an AI to be Friendly unless you had enough evidence to *know* it
> was Friendly; roughly, you could not expect *any* complex behavior that
> was a small point in the space of possibilities, unless you had enough
> evidence to single out that small point in the large space.

You do not know what percentage of AI systems comprehensible and
engineerable by humans, and taught by humans to be Friendly, and
engineered to be internally transparent and judged by knowledgeable
humans to have internal structures/dynamics consistent with
Friendliness, are going to be Friendly.

So, you don't know how small are the odds of Friendliness within the
relevant subspace of the set of all possible AI systems. The odds of
Friendliness within the space of all AI systems is irrelevant.

-- Ben G

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT