General summary of FAI theory

From: Tom McCabe (
Date: Tue Nov 20 2007 - 15:27:57 MST

SL4 is supposed to be for advanced topics in futurism,
not endlessly
rehashing the basics. Some of the things which have
already been
covered years ago, and are therefore ineligible for

1). An AGI will not act like an "enslaved" human, a
resentful human,
an emotionally repressed human, or any other kind of
human. See

2). Friendliness content is the morality stuff that
says "X is good, Y
is bad". Friendliness content is what we think the FAI
*should* do.
FAI theory describes *how* to get an FAI to do what
you want it to do.

3). FAI theory is damn hard; it is much harder than
content. So far as I know, nobody knows how to make
sure that some AGI
design reliably produces paperclips, which is much
*simpler* than
ensuring reliable Friendliness. Keep in mind that the
content must be maintained during recursive
self-improvement, or the
FAI may wind up destroying us all on programming

4). CEV is a way of deriving Friendliness content from
collective cognitive architecture. CEV is a
morality-constructor, not
a morality in and of itself; if you speak programming,
think of CEV as
a function that takes the human race as an argument
and returns a

5). Goal systems naturally maintain themselves (under
conditions). If the AGI has a supergoal of X, changing
to a supergoal
of X' will mean that less effort is put towards
accomplishing X.
Because the AGI *currently* has a supergoal of X, the
switch will
therefore be seen as undesirable. It's not like you
have to point a
gun at the AGI's head and say, "Do X or else!"; no
external coercion
is necessary. See

6). An AGI has the goals we give it. It does not have
human-like goals
such as "reproduce", "survive", "be nice", "get
revenge", "avoid
external manipulation", etc., unless we insert them or
they turn out
to be useful for fulfillment of supergoals. See

7). The vast, vast majority of goal systems lead to
the destruction of
the Earth. The Earth's actual destruction would be
more complicated
than this, but essentially, more energy, matter,
computing power, etc.
are almost always desirable, and so the AGI won't stop
consuming the
planet for its own use until it runs out of matter.

8). Just because the AGI can do something doesn't mean
it will. This
is what Eli calls the Giant Cheesecake Fallacy- "A
AGI could make huge cheesecakes, cheesecakes larger
than any ever made
before; wow, the future will be full of giant
cheesecakes!" Some
examples of this in action:

"The AGI, being superintelligent, has all the
computational power it
needs to understand natural language. Therefore, it
will start
analyzing natural language, instead of analyzing the
nearest random

"The AGI will be powerful enough to figure out exactly
what humans
mean when they give an instruction. Therefore, the AGI
will choose to
obey the intended meanings of human instructions,
rather than obey the
commands of the nearest lemur."

9). In general, it is much easier to work with simple
examples than
complicated examples. If you can't do the simple
stuff, you can't do
the complicated stuff. If you can't prove that an AGI
will flood the
universe with paperclips and not iron crystals, you
can't prove that
an AGI will be Friendly.

 - Tom

Be a better pen pal.
Text or chat with friends inside Yahoo! Mail. See how.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:00 MDT