Re: [agi] Two draft papers: AI and existential risk; heuristics and biases

From: Eliezer S. Yudkowsky (
Date: Tue Jun 06 2006 - 23:20:43 MDT

Ben Goertzel wrote:
> CFAI tried, and ultimately didn't succeed, to articulate an approach
> to solving the problem of Friendly AI. Or at least, that is the
> impression it made on me....
> On the other hand, AIGR basically just outlines the problem of
> Friendly AI and explains why it's important and why it's hard.
> In this sense, it seems to be a retreat....
> I suppose the subtext is that your attempts to take the intuitions
> underlying CFAI and turn them into a more rigorous and defensible
> theory did not succeed.

The subtext is:

1) Do not propose any solutions before discussing the problem as
thoroughly as possible without proposing any. This is most important
when dealing with extremely difficult problems, as that is when people
are most apt to propose solutions immediately. See the associated book
chapter on heuristics and biases.

2) This is a book chapter for general academic readers interested in
how AI fits into the big picture of global catastrophic risks. It was
sharply constrained by space, as you may have noticed, and there simply
wasn't time to go into any AI-design details. That would have been a
book not a chapter.

3) I am still working on a rigorous theory and have made what I count
as progress. Over the next year or so, I hope to work on this nearly
full-time, and am refusing to take on other commitments (such as book
chapters) to make sure my time stays free.

CFAI proposed a solution too quickly, and worse, claimed it was a
workable approach in itself. Before a complete solution necessarily
comes a partial solution, where you know how to solve M out of N
problems with M < N. This is where I am now, but at least I know it.
At the time of CFAI, I (Eliezer-2001) had difficulty admitting I didn't
have a workable solution in hand, because that would have meant that I'd
have to work more on FAI theory instead of doing what I wanted to do,
what I thought would make me look more respectable, and plunging
straight into AI as soon as I had the funding to hire more
programmers... Actually, it would be more accurate to say the reason I
wanted to believe I had a workable solution in hand, was because this
let me preserve all the existing plans for AGI development that I had
made before I realized that FAI was an issue. People try to preserve as
much of their existing plans as possible, when unexpected news arrives;
in this case, what I needed to do, and did not do, was rethink all my
plans from scratch. But that was a much younger Eliezer... Needless to
say, I think that you, Ben, are now making my old mistake.

> I also note that your Coherent Extrapolated Volition ideas were not
> focused on in AIGR, which I think is corrrect because I consider CEV a
> fascinating science-fictional speculation without much likelihood of
> ever being practically relevant.

That is because CEV is merely my proposed *solution*, and AIGR doesn't
even get far enough into discussing the problem; it is nowhere near the
point where it would become wise to propose a solution. Did you read
the chapter on heuristics and biases? If not, please stop here, and
read that chapter.

> I agree with you that taking a more rigorous mathematical approach is
> going to be the way -- if any -- to a theory of FAI. However, I am
> more optimistic that this approach will lead to a theory of FAI
> **assuming monstrously great computational resources** than to a
> theory of pragmatic FAI. This would be expected since thanks to
> Schmidhuber, Hutter and crew we now have the beginnings of a theory of
> AGI itself assuming monstrously great computational resources, but
> nothing approaching a theory of AGI assuming realistic computational
> resources...

As previously disscussed on AGI, I think that Schmidhuber, Hutter et.
al. left key dimensions out of their AI, such as its ability to conceive
of what happens when it drops an anvil on its own head. That is, what
happens when an environmental process penetrates the intrinsic Cartesian
boundary on which their formalism is based.

> It would seem to me that FIRST we should try to create a theoretical
> framework useful for analyzing and describing AGIs that operate with
> realistic computational resources.

This is more or less what I'm doing right now.

And lo, I only started making progress on the problem by holding it to
Friendly AI standards of determinism and knowability. Otherwise, you
end up sweeping all the interesting parts of the problem under the rug,
forgiving your own ignorance, hoping for good results without proof, and
generally holding yourself to much too low a standard to come up with an
interesting theory. I've made a lot more progress on AGI than in the
CFAI/LOGI era, and the difference was holding myself to the standard of

AGI understanding will always run ahead of FAI understanding; I have
previously remarked on this point - it is what makes the problem of
Earth's survival *difficult*. Surely it is not possible to be able to
build FAI, and not be able to build AGI. But you can develop
sophisticated AGI techniques that are not even theoretically usable for
AGI, that *cannot* be reshaped to safety. Thinking about AGI doesn't
put you on an incremental path. I've *been* there, Ben. I wrote LOGI
while thinking about AGI, and then I had to throw LOGI away and start
over from scratch because it wasn't even on an incremental path to FAI.
  Neither is CFAI, for that matter.

> Then Friendly AI should be
> approached, theoretically, within this framework.

You won't be able to approach FAI "within" an AGI framework that was
designed without thinking about FAI. You *will*, always, be able to
approach a certain type of AGI within a framework that was designed for
thinking about FAI. *That* is where the inequality comes from - that's
why you'll always know more about AGI than FAI at any given point.

What you need is a frame of mind in which there are no "AGI" problems.
There is, simply, the goal of building a Friendly AI, and you do what is
required for that in whatever order seems best, including devising a
theory of optimization with limited computational resources. That
someone else could call that "AGI" is of no consequence, except insofar
as there is existential risk from incomplete theories.

> I can see the viability of also proceeding in a more specialized way,
> and trying to get a theory of FAI under limited resources in the
> absence of an understanding of other sorts of AGIs under limited
> resources. But my intuition is that the best way to approach "FAI
> under limited resources" is to first get an understanding of "AGI
> under limited resources."

The vast majority of AGI techniques are intrinsically unsuited to FAI
and are not on an incremental pathway to FAI. So why am I, right now,
working mostly on "AGI"-ish questions, rather than CEV-ish questions?
Because, in the course of solving those problems which are naturally
encountered on the road to an FAI theory, one finds that the simplest
questions of FAI, which must be answered first before moving on to more
complex questions, happen to be questions *about a certain type of AGI*.
  This does *not* mean you can answer them if you conceive of what you
are doing as "trying to build AGI" rather than "trying to build FAI".

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT