Re: Knowability of Friendly AI (was: ethics of argument)

From: Eliezer S. Yudkowsky (
Date: Mon Nov 11 2002 - 08:50:06 MST

Ben Goertzel wrote:
>>I'd also ask you to consider narrowing your focus from the extremely
>>general issue of "the stability of self-modifying goal systems" to
>>statements of the order found in CFAI, such as "A goal system with
>>external reference semantics and probabilistic supergoals
>>exhibits certain
>>behaviors that are morally relevant to Friendly AI and necessary to
>>Friendly AI construction, and is therefore a superior design choice by
>>comparison with more commonly proposed goal system structures under which
>>supergoals are treated as correct by definition." Why do you believe
>>that, e.g., this specific question cannot be considered in advance?
> Although I've spent much of my life creating heuristic conceptual arguments
> about topics of interest, the three forms of knowledge I trust most are:
> -- mathematical
> -- empirical
> -- experiential [when referring to subjective domains]

It looks to me like you're missing out on the entire domain of
nonmathematical abstract reasoning, e.g. by combining generalizations from
previous empirical experience, or by nonmathematical reasoning about the
properties of abstract systems that are simple enough to be modeled

> My own arguments as to why Novamente will work as an AGI are also in the
> category of "moderately convincing, suggestive, heuristic arguments" at this
> stage. But we're working hard to turn them into empirical demonstrations ;)
> One characteristic of "moderately convincing, suggestive, heuristic
> arguments" is that intelligent, informed, fair-minded people can rationally
> disagree on their validity....

Another interesting characteristic of such arguments is that they can be
communicated between individuals. Of course I can't stop you from saying
"I, Ben Goertzel, choose to trust my experiential arguments over your
heuristic ones". However, you are not a single individual in isolation.
You have, on this mailing list and others, argued that funding Novamente
is a Singularitarian endeavor. So it may well be, but only if its theory
of Friendly AI improves, and I am willing to offer specific arguments in
support of the latter statement. The domain of "moderately convincing,
suggestive, heuristic arguments" is exactly that which governs Singularity
strategy, including questions such as "Should we try to build AI, at all,
in the first place?" and "Is this particular AI project likely to destroy
the world?" It's also, as I believe you point out, the way that both of
us intend to use to build AI in the first place. So if the kind of
reasoning I'm using to think about Friendly AI can't really be any more
useful than the kind of thinking that will let you, say, save the world or
build an intelligent being, then I think that's probably useful enough for
me. If anything it shows that the right caliber of reasoning is being
used, since these are all tasks in the same domain.

You are an individual, you can always go off and build AI regardless of
what the heuristic arguments say, but right now the heuristic arguments
say that Novamente as it stands, if it works, will destroy the world, and
all else being equal I'd rather not rely on the pleasant but basically
ungrounded possibility that Ben Goertzel or someone else on the Novamente
project will learn at such a rate to arrive at a workable theory of
Friendly AI before it's much, much too late. This isn't to say it can't
happen. It's just that with the entire world at stake, I'd rather not be
reduced to relying on pure ungrounded hope, all else being equal.

Saying "I distrust all heuristic arguments" doesn't really cut it here.
One, you don't distrust your *own* heuristic arguments, so it seems rather
solipsistic to apply different standards of evidence to arguments that you
like versus arguments you don't. And two, I don't see even a good
heuristic argument - let alone a mathematical one - that developing AI in
the total absence of even heuristic knowledge as to whether it will be
Friendly is really all that good a strategy for humanity. It seems to me
that your claim that nothing can be known about Friendly AI in advance
would be, if it were true, a strong (though not knockdown) argument
against developing AI in the first place.

Really, I'm stuck in the same position with you that I am with the various
people who argue against, say, the proposition that AI can ever work in
the first place; no matter how much evidence I present, they can always
claim to feel uncertain about it. On the other hand, convincing the third
parties in the audience is an entirely separate issue. Basically, you've
placed me in a position where I can simultaneously see that Novamente's
current described structure would prima facie result in an SI existential
catastrophe if Novamente began recursive self-improvement, and where I
have no particular empirical reason - apart from pure hope - to suppose
that matters will improve over the next years. Now of course this could
simply be my private nightmare, but since I am *not* arguing from my own
uncertainty, it looks to me like my heuristic nightmare is communicable
between rational thinkers and your selective uncertainty is not.

Would it be fair to summarize your argument so far as: "Novamente is a
good Singularity project because nothing useful can be known about
Friendly AI in advance, which unknowability is itself knowable on the
grounds that Friendly AI is neither empirically demonstrated nor
mathematically proven knowledge. It is correct Singularity strategy to
invest in AI projects when nothing is known about Friendly AI, since the
only way to find out is to try it. The amount of concern I've shown for
Friendly AI so far is around the right proportional amount of concern
desired in the leader of a Singularity AI project."

One of the major problems I have with this is that I have a plan for
learning various pieces of empirical information about Friendly AI. I may
make observations that I had no way of anticipating, but there are also
plenty of specific things that I already want to find out about - that I
know enough to look for. I'm not confident in the Novamente project's
ability to learn about Friendly AI empirically without a model. Why would
you acquire empirical knowledge about what happens when a goal system has
the ability to reflectively model itself at a level where it can reason
about the abstract desirability of the presence or absence of a causal
system whose effect is to modify the supergoals? Why would you create
lots of models like this in the Friendliness Failure Lab and test them to
see if they work, and more importantly how they fail, if you don't know
that the question above is an important one? Are you even going to *have*
a Friendliness Failure Lab?

It looks to me like Novamente will start out with a such a vague theory of
AI morality that adjusting the theory to fit evidence coincidentally
encountered while doing other things will produce an only slightly less
vague theory. Let's suppose you're right and CFAI turns out to be
completely wrong because it's not mathematics. I'd still expect that
empirically investigating everything that CFAI suggests should be
investigated would produce a hell of a lot of knowledge about AI morality.
  Maybe enough that pure generalization from empirical evidence would be
sufficient to give birth to a new coherent theory that was knowably
adequate to produce Friendly AI, and if not, we could always just fold up
shop and stop working on AI. CFAI has a lot of hypotheses that can be
investigated early on, so we'd know long before reaching a danger point if
the theory was junk by virtue of not being mathematics. It looks to me
like your theory of AI morality is sufficiently vague that, if Novamente
became capable of recursive self-improvement at the point you say you
expect, it seems likely that - at that point in the development process -
you would have modified a vague theory of AI morality into a different
vague theory of AI morality, but not acquired the detailed knowledge
needed for Friendly AI. Now I could be wrong, of course - from your
perspective, the statement is suspect because I am basing it on a
CFAI-based visualization of what you're likely to encounter. But there is
still a strategic problem in bulling ahead when not only do you not know,
you also don't have a specific theory of what you want to know or when you
need to know it.

In Singularity strategy terms, what Novamente is doing seems to verge on
ignoring the Friendly AI question entirely - it currently makes sense to
invest in Novamente if and only if you believe that investing in a generic
recursively self-improving AI project is a good thing. Now there are a
number of reasons why this might be a good idea, such as that the project
doesn't actually succeed but produces critical knowledge or even tools of
Singularity relevance (such as an Earthweb), or that you expect it is
knowably the case that AI developers learn enough about Friendly AI
development to get by even if they aren't looking, or you think that you
*have* to rely on the previous chance because otherwise the world is gonna
blow up. But I don't think it's *necessary* to go down that route. I
think - as you seem to deny - that it's possible to take a *lot* of
territory on the Friendly AI part of Singularity strategy, over and above
that represented by a generic recursively self-improving AI project.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:41 MDT