Friendliness nomenclature

From: Rolf Nelson (rolf.h.d.nelson@gmail.com)
Date: Sun Apr 27 2008 - 10:16:41 MDT

Next message: Tim Freeman: "Agreements (was Re: Property rights)"
Previous message: Byrne Hobart: "Re: Property rights (was Re: Can't afford to rescue cows)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Here's my proposal for terminology.

A "Friendly AI" is an *AGI* that has a *deliberately* positive, rather than
a negative, impact on humanity.

1. A narrow AI that diagnoses cancer is not a "Friendly AI" because it is
not an AGI; it is trivially controllable, and its intelligence is not
powerful nor general enough to deliberately cause harm against the wishes of
its creators and operators.

2. Suppose a rogue, but boxed, AGI almost destroys the world, but we use
that AGI to learn from our mistakes, and in the end we realize it was an odd
stroke of luck that we had that particular boxed AGI to learn from. The AGI
made the world a better place, but only accidentally (from the AGI's point
of view); it was not part of the AGI's goal structure to do so in that way.
Because its positive impact was not a *deliberate* part of its goal
structure, the AGI is not "Friendly" per se.

The semantic "prototype" is a deliberately-built AGI that safely ushers
humanity through the Singularity. Weaker semantic examples of the category:

1. An AGI that *would* be a Friendly AI in most Possible Worlds, but fails
in ours, for example because Jason Vorhees is watching and stabs everyone
when we try to turn the Friendly AI on.

2. An AGI that is Friendly towards Joe Smith of 1243 Maple Drive, but that
kills the rest of us at Joe's bidding.

3. A powerful AGI that gives everyone a free popsicle, and then chooses to
shut itself off.

Extremely weak semantic examples:

4. A human has (non-artificial) General Intelligence, so an extremely broad
model of Friendliness might characterize humans as (capital-F) Friendly, by
analogy with a Friendly AI.

5. A model may view Friendliness as a chain of systems: FAI-1 builds FAI-2,
FAI-2 builds FAI-3, etc. An AGI is Friendly in such a model if it is
"terminally" Friendly, or if it builds an AGI that is friendly. By
induction, a human who builds FAI-1 might be considered Friendly by such a
model. (However, evolution would not be considered "Friendly" even though it
built the human, because it fails the "deliberateness" requirement.)

A Friendly AI project cleaves naturally into two challenges:

1. Friendliness Theory: How can an AGI remain Friendly through successive
rounds of modifications? How can you get an AGI to want what you "really"
want, and what do we mean by "what you really want"?

2. AGI Ethics: If the Friendliness Theory problem were solved for an AGI,
what would it be ethical for us to do with that AGI?

Next message: Tim Freeman: "Agreements (was Re: Property rights)"
Previous message: Byrne Hobart: "Re: Property rights (was Re: Can't afford to rescue cows)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:02 MDT