General summary of FAI theory

From: Thomas McCabe (pphysics141@gmail.com)
Date: Tue Nov 20 2007 - 14:20:56 MST


SL4 is supposed to be for advanced topics in futurism, not endlessly
rehashing the basics. Some of the things which have already been
covered years ago, and are therefore ineligible for rehashing:

1). An AGI will not act like an "enslaved" human, a resentful human,
an emotionally repressed human, or any other kind of human. See
http://www.intelligence.org/upload/CFAI//anthro.html.

2). Friendliness content is the morality stuff that says "X is good, Y
is bad". Friendliness content is what we think the FAI *should* do.
FAI theory describes *how* to get an FAI to do what you want it to do.
See http://www.intelligence.org/upload/CEV.html.

3). FAI theory is damn hard; it is much harder than Friendliness
content. So far as I know, nobody knows how to make sure that some AGI
design reliably produces paperclips, which is much *simpler* than
ensuring reliable Friendliness. Keep in mind that the Friendliness
content must be maintained during recursive self-improvement, or the
FAI may wind up destroying us all on programming iteration
#1,576,169,123.

4). CEV is a way of deriving Friendliness content from humanity's
collective cognitive architecture. CEV is a morality-constructor, not
a morality in and of itself; if you speak programming, think of CEV as
a function that takes the human race as an argument and returns a
morality.

5). Goal systems naturally maintain themselves (under most
conditions). If the AGI has a supergoal of X, changing to a supergoal
of X' will mean that less effort is put towards accomplishing X.
Because the AGI *currently* has a supergoal of X, the switch will
therefore be seen as undesirable. It's not like you have to point a
gun at the AGI's head and say, "Do X or else!"; no external coercion
is necessary. See
http://www.intelligence.org/upload/CFAI//design/structure/external.html.

6). An AGI has the goals we give it. It does not have human-like goals
such as "reproduce", "survive", "be nice", "get revenge", "avoid
external manipulation", etc., unless we insert them or they turn out
to be useful for fulfillment of supergoals. See
http://www.intelligence.org/upload/CFAI//anthro.html#observer.

7). The vast, vast majority of goal systems lead to the destruction of
the Earth. The Earth's actual destruction would be more complicated
than this, but essentially, more energy, matter, computing power, etc.
are almost always desirable, and so the AGI won't stop consuming the
planet for its own use until it runs out of matter.

8). Just because the AGI can do something doesn't mean it will. This
is what Eli calls the Giant Cheesecake Fallacy- "A superintelligent
AGI could make huge cheesecakes, cheesecakes larger than any ever made
before; wow, the future will be full of giant cheesecakes!" Some
examples of this in action:

"The AGI, being superintelligent, has all the computational power it
needs to understand natural language. Therefore, it will start
analyzing natural language, instead of analyzing the nearest random
quark."

"The AGI will be powerful enough to figure out exactly what humans
mean when they give an instruction. Therefore, the AGI will choose to
obey the intended meanings of human instructions, rather than obey the
commands of the nearest lemur."

9). In general, it is much easier to work with simple examples than
complicated examples. If you can't do the simple stuff, you can't do
the complicated stuff. If you can't prove that an AGI will flood the
universe with paperclips and not iron crystals, you can't prove that
an AGI will be Friendly.

 - Tom



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:00 MDT