Re: guaranteeing friendliness

From: H C (
Date: Tue Nov 29 2005 - 00:08:13 MST

>From: "Eliezer S. Yudkowsky" <>
>Subject: Re: guaranteeing friendliness
>Date: Sat, 26 Nov 2005 08:05:56 -0800
>Chris Capel wrote:
>>My understanding of where Phillip was going is that the techniques
>>used to provide some measure of certainty about the direction the
>>world climate is moving, if developed, would have a lot of
>>resemblances to the sort of guarantee that Eliezer is trying to
>>develop with friendliness. It would have to take a very complicated
>>system, with lots of unknowns, and try to derive statistically certain
>>conclusions about the long-term direction of those systems that we can
>>be justified in believing are valid and immune to unintentional
>>manipulation by the modeler and verifiable in some formal capacity.
>Not so. A FAI is *designed* to be verifiably predictably Friendly and not
>in a statistical sense either. The global warming dynamics are not thus
>designed. If we have to have controversial arguments about whether some
>specific FAI design is Friendly in the technical sense, arguments like
>we're having about global warming, then screw the design, come up with a
>better one.
>As I recently said on wta-talk to James Hughes's proposal for "morality
>software" for humans:
>There's a theorem generalizing Turing's halting theorem, Rice's Theorem,
>which says that you cannot *in general* determine whether a computational
>process implements *any* nontrivial function - including, say,
>multiplication. Then how is it possible that human engineers build
>computer chips which reliably perform multiplication? Computer engineers
>build special cases of chips, *not* chips in general. They deliberately
>use only those designs that they *can* understand. They select an
>architecture of which they can predict - and in some cases, formally prove
>- that the chip design implements multiplication.
>SIAI's notion of Friendliness assurance relies on being able to design an
>AI specifically for the sake of verifiability. Needless to say, humans are
>not so designed. Needless to say, it is not a trivial project to thus
>redesign a human. I cannot imagine going about it in such way as to
>preserve continuity of personal identity, the overall human cognitive
>architecture, or much of anything. SIAI's notion of Friendliness relies on
>selecting an AI design of which we can verifiably say that it would never
>choose to expend effort on defeating its own Friendliness. As opposed to
>superposing external "morality software" onto a mind that might not like
>it, or a mind that might plan in advance to defeat it.
>Eliezer S. Yudkowsky
>Research Fellow, Singularity Institute for Artificial Intelligence

It's not so rediculous as it sounds.

For example, provide an AGI with some sort of virtual environment, in which
it is indirectly capable of action.

It's direct actions would be in text only direct action area (imagine it's
only direct actions being typing a letter on the keyboard, such as in a text

The only *effective* actions (outside of this text only direct action area)
are those in proper syntatical format; specifically, it would essentially
have to write an algorithm in some computer language in order to establish
any effectual action in its environment.

Then, before the compiled function or algorithm is executed, it must be
reviewed by some human programming team.

This could be carried much further, such humans could subjectively reject
any algorithm, and even leave comments for the AGI to review in order for it
to understand why its action was rejected (ex. confusing, incorrect syntax,
difficult to understand, not sure of intention, not enough comments in code,

This would provide a perfectly controlled environment in which all the AGI's
motivations, thoughts, and attempted and succesfull actions would be
verifiably thoroughly understood by some human team. After a little practice
with the AGI, the human team could (quite concievably) create several
different tests that verify exactly the AGI's basic sentient drives, it's
true motivations, etc.


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:53 MDT