Re: Friendliness not an add-on

From: H C (
Date: Sat Feb 18 2006 - 22:52:00 MST

>The verifier will be built before Novamente is given strong
>self-modification abilities. But building the kind of Friendliness
>verifier I'm thinking of is almost surely harder than building
>toddler-level AGI, and so we are working toward the latter from a
>practical implementation/testing point of view while working on
>Friendliness verification and other more advanced topics from a purely
>theoretical perspective at the moment.
>The reason we do not consider this unsafe is basically that we are
>quite sure our architecture will not permit a toddler-level AGI to
>undergo any kind of hard takeoff. We have not formally proved this,
>but nor have I (or you) formally proved that my toaster will not
>undergo a hard takeoff...

If take-off is an explosion...

If you follow my metaphor, it seems to me as though Ben is trying to create
a nuclear reactor, whereas Eliezer is trying to make a nuclear bomb.

Of course, there is always the potential for a reactor that melts down and
turns into a nuclear explosion. The only thing is to remember that some
nuclear explosions are good, and others are bad.

Really, either of their approaches can work fine in theory. Ben's approach
is to get the nuclear reaction into a controlled environment, and then
determine whether/how to melt the thing down into a safe explosion. Eliezer
says lets just figure out how to slap the plutonium together in the right
way from the beginning, and save ourselves the trouble of building all the
extra reactor engineering and getting bogged down in the details of reactor

Eliezer DOES have a point though- if you carry it through the metaphor, then
you have to understand that reactor safety is about intelligent reactor
safety as opposed to nuclear reactor safety. I don't know how comparable
these two things are. It has definitely been demonstrated through much of
the philosophy here on SL4 that reactor safety is definitely a difficult
problem, and is independent of explosion Friendliness. Simply put, if the
explosion is guaranteed Friendly, then you have no need for a reactor at
all. Otherwise, you not only have to worry about reactor safety, but ALSO
explosion Friendliness.

However, I don't think Ben's approach is impossible, impractical or
necessarily sub-optimal. If you have verifiable reactor safety, and that
actually ends up being EASIER than verifiable explosion Friendliness, then
what do you know, Ben would have taken the optimal approach from the
beginning, because he would have his verifiably safe reactor with which he
could study Friendliness in it's definite and observable form (such that the
problem of Friendliness explosion becomes easier to solve than through
theory alone).

Now if we only had a Manhatten project...


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:55 MDT