Re: Friendly AI in "Positive Transcension"

From: Eliezer S. Yudkowsky (
Date: Sun Feb 15 2004 - 14:30:52 MST

Ben Goertzel wrote:
> Hi,
>> This cannot possibly be done. What you're asking is undoable.
> OK, well if you can't even summarize your own work compactly, then how
> the heck do you expect me to be able to do so??? ;-)

I don't, of course. However, I should hope that you would be aware of
your own lack of understanding and/or inability to compactly summarize,
and avoid giving summaries of FAI that are the diametric opposite of what
I mean by the term. I don't understand Novamente, and I say so. I
wouldn't want to make your job of explaining Novamente even harder by
giving my own "explanation" of it.

To make progress on explaining FAI is my own burden, not yours; I only ask
that you avoid making anti-progress.

>> You're missing upward of a dozen fundamental concepts here.
> Look, any brief summary is going to miss a lot of fundamental concepts.
> That is the nature of summary. In summarizing something, one has to
> choose what to include and what to leave out.

By "missing" I don't mean "failing to include" but "making statements in
contradiction of".

>> First, let's delete "programming or otherwise inculcating" and
>> replacing with "choosing", which is the correct formulation under the
>> basic theory of FAI, which makes extensive use of the expected
>> utility principle. Choice subsumes choice over programming, choice
>> over environmental information, and any other design options of which
>> we might prefer one to another.
> Fine, we can refer to "choosing", while noting that programming and
> teaching are the apparently most likely forms of choosing in this
> context...

Well, you could note that, if that were your opinion, I suppose.

>> Next, more importantly, "humane" is not being given its intuitive
>> sense here! Humane is here a highly technical concept, "renormalized
>> humanity".
> So far as I can tell this is a fuzzy and ill-defined and obscure
> concept, lacking a clear and compact definition
> Feel free to give one, or refer me to a specific paragraph in one of
> your online writings where such is given.

On the contrary, I gave a fuzzy and ill-defined and obscure exposition of
a clear and compact concept. To make it clear and compact would require a
lot of technical background and numerous new concepts, after which I could
give a clear and compact definition. I am not going to be able to define
this thing in one paragraph. I am not asking you to define it for me,
either. I am asking you, if I haven't finished explaining it, to avoid
misexplaining it on my behalf, because it is complicated.

>> It's not that you're misunderstanding *what specifically* I'm saying,
>> but that you're misunderstanding the *sort of thing* I'm attempting
>> to describe. Not apples versus oranges, more like apples versus the
>> equation x'' = -kx.
> OK, so please clearly explain what sort of thing you're attempting to
> describe.

Why do you expect me to be able to do this off-the-cuff? You once claimed
that understanding AI ethics was work for decades and a Manhattan Project.
  If I have made progress, why do you expect that progress to be
explainable in one paragraph?

I am going to take a stab below, but don't be surprised if I fail.

Presently, you seem to be in a state of feeling that there is nothing to
understand. The most I can seriously hope to do is to show you that there
exists a thing-to-understand which is not similar to previously understood
things. Explaining the thing-to-understand seems to be beyond hope of our
current communications bandwidth.

Obviously I am still in the middle of working some things out. The
problem arises when you present Friendly AI to your readers as a finished
theory that contradicts this thing that I am in the process of working
out. Additional reasons why I am emotionally annoyed is that you are
presenting FAI as the advocacy of a class of theories that I took specific
offense at in 1996, thus starting me down the road that in 2000 would lead
to my first faltering attempts at doing real work in FAI. You are
presenting FAI as the problem I created FAI to solve. You accuse me of
advocating exactly the position that I reacted against by creating FAI.
(I can provide historical references to document this on request.)

That you also dismiss FAI because of supposed failure address exactly
those problems that FAI was created specifically to solve, is also
personally annoying, but much more forgiveable, since if I have not solved
those problems to your satisfaction, I have not. At most I would object
to the implicit connotation that I haven't thought of the problems.

It is just the active misrepresentation of myself as holding the diametric
opposite of my actual views that I am complaining about here. I quite
understand that you cannot be expected to explain FAI; that is my job.

>> Your most serious obstacle here is your inability to see anything
>> except the specific content of an ethical system - you see "Joyous
>> Growth" as a specific ethical system, you see "benevolence" as
>> specific content, your mental model of "humaneness" is
>> something-or-other with specific ethical content. "Humaneness" as
>> I'm describing it *produces* specific ethical content but *is not
>> composed of* specific ethical content. Imagine the warm fuzzy
>> feeling that you get when considering "Joyous Growth". Now,
>> throughout history and across the globe, do you think that only
>> 21st-century Americans get warm fuzzy feelings when considering their
>> personal moral philosophies?
> And, I *don't* see abstract ethical principles as being specific
> ethical systems, I tried to very clearly draw that distinction in my
> essay, by defining abstract ethical principles as tools for judging
> specific ethical systems, and defining ethical systems as factories for
> producing ethical rules.

To me these are subcategories of essentially the same kind of content,
humans thinking in philosophical mode. The commensurability of abstract
ethical principles, specific ethical systems, and ethical rules, as
cognitive content, are why we have no trouble in reasoning from verbally
stated ethical principles to specific ethical systems and so on. They are
produced by human philosophers; they are argued all in the same midnight
bull sessions; they are written down in books and transmitted memetically.
  Children receive them from parents as instructions, understand them as
social expectations, feel their force using the emotions and social models
of frontal cortex. Principles, systems, and rules are subcategories of
the same kind of mental thing manipulated in the mind - they've got the
same representation. Humans argue about principles, systems, and rules
fairly intuitively - the debate goes back to Plato at the least.

FAI::humaneness is a different *kind of thing* from this. There is no way
to argue FAI::humaneness to someone; it's not cognitive content, it's a
what-it-does of a set of dynamics. It's not the sort of thing you could
describe in a philosophy book; you would describe it by pointing to a
complete FAI system that implements FAI::humaneness. It's not an
arguable-thing, or a verbal-language-thing. Consider the difference
between a picture, the visual cortex, the genetic information that
specifies how to wire a visual cortex in a context of incoming visual
stimuli, and the selection pressure that produces the genetic information.
   These are all *different kinds of things*. I cannot explain
FAI::humaneness by simple analogy to any of them, but it illustrates what
I mean by incommensurability.

> I can understand if you're positing some kind of "humaneness" as an
> abstract ethical principle for producing specific human ethical
> systems. It still seems to me like it's a messy, overcomplex,
> needlessly ill-defined ethical principle which is unlikely to be
> implantable in an AI or to survive a Transcension.

FAI::humaneness is not an abstract ethical principle, or anything
commensurate with abstract ethical principles. (There does seem to be a
*very* strong tendency for people to try to interpret all issues of AI
morality in terms of that class of cognitive content, which is probably
the number one problem I run into in explaining things.) As for messy,
overcomplex, needlessly ill-defined - I would have no problem with your
paper if you accused FAI of that! It is a common enough sentiment and not
actively misrepresentative. However, since I have failed to explain the
"sort of stuff" that FAI::humaneness is made of, it is not surprising that
you would consider it "messy", "overcomplex", "unstable" - or
"non-commutative", "purplish-brown", or "less than 0". Heaven knows what
sins the model in your mind is guilty of, since it's formed of stuff not
commensurate with what I'm trying to describe.

>> The dynamics of the thinking you do when you consider that question
>> would form part of the "renormalization" step, step 4, the volition
>> examining itself under reflection. It is improper to speak of a vast
>> morass of "humane morality" which needs to be renormalized, because
>> the word "humane" was not introduced until after step 4. You could
>> speak of a vast contradictory morass of the summated outputs of human
>> moralities, but if you add the "e" on the end, then in FAI theory it
>> has the connotation of something already renormalized. Furthermore,
>> it is improper to speak of renormalizing the vast contradictory
>> morass as such, because it's a superposition of outputs, not a
>> dynamic process capable of renormalizing itself. You can speak of
>> renormalizing a given individual, or renormalizing a model based on a
>> typical individual.
>> This is all already taken into account in FAI theory. At length.
> Well, I'm not sure I believe there is a clear, consistent, meaningful,
> usable entity corresponding to your two-word phrase "humane morality."
> I'm not so sure this beast exists. Maybe all there is, in the
> human-related moral sphere, is a complex mess of interrelated, largely
> self-contradictory ethical systems, guided by some general principles
> of complex systems dynamics and by our biological habits and heritage.

If you think FAI is messy, self-contradictory, etc., I have no problems
with you putting those criticisms into the paper.

What I wish you to avoid is presenting FAI as if it were a specific
ethical principle, or anything commensurate with a specific ethical
principle, because that's the *wrong sort of stuff* to describe either
FAI::humaneness or FAI architectural principles. It is, in fact,
diametrically opposed to FAI, which was created to address inherent and
fundamental problems with trying to imbue specific ethical principles into
an AI - this is what I consider to be the generalized Asimov Law fallacy.
  I have the same objections to Joyous Growth, but that's a separate
issue. It's not surprising that you see no superiority of FAI over Joyous
Growth if you attempt to somehow interpret FAI as a specific ethical
principle. This is actually impossible, like trying to interpret a
mathematical system as the quantity 15, but after you've invented a
specific ethical principle such as "benevolence toward humans" and
attached that concept to Goertzel::FAI, then Goertzel::FAI would indeed
have no advantage over Joyous Growth. It omits the entire problem FAI was
intended to solve. In fact, it misrepresents FAI as something which
happens to be a clear instance of the problem FAI was intended to solve.
Do you see why I'm objecting here?

If you say that the specific ethical principle underlying FAI is
"benevolence toward humans", this is objectionable because it
misrepresents FAI as a specific ethical principle, and no amount of
rephrasing will get rid of that. Criticisms of FAI on the basis of being
too specific and concrete as an ethical principle implicitly misrepresent
FAI as an ethical principle; I would much prefer that these criticisms be
amended to state that FAI involves too much specific and concrete
"information", which, if taken as implying Shannon information about the
space of possible outcomes, would not imply anything actively misleading
about the nature of FAI. And so on.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:45 MDT