Re: Friendliness not an Add-on

From: Marcello Mathias Herreshoff (
Date: Sat Feb 18 2006 - 23:28:02 MST

On Sat, Feb 18, 2006 at 10:42:40PM -0500, Ben Goertzel wrote:
> Howdy Marcello,
> > But, Rice's theorem <'s_theorem > states
> > that there is no way to create a checker for every non-trivial property
> > (Friendliness is certainly non-trivial in this sense.)
> I don't see Rice's Theorem as directly relevant here, because it
> doesn't tell you anything about the quality of probabilistic reasoning
> achievable by practical intelligent systems under realistic resource
> limitation; it's an uncomputability-in-principle sort of result...
In the example given, the AI output a piece of source code. Being able to
check whether this arbitrary piece of source code is Friendly is something
that Rice's theorem says you cannot do. It definitely applies here.
It *does* describe "the quality of probabilistic reasoning achievable by
practical intelligent systems under realistic resource limitation" because
that's a special case of computer programs in general.

In order for it not to apply, the piece of code must not to be arbitrary.
Not being arbitrary means that we have more usefull information. For
example, a log of deductions the AI performed in order to reach its concusion
(in this case, a piece of code) would definitly be usefull.

If the piece of information given is not a string of deductions, it must be
something which can be translated into such a string of deductions at a
reasonably low cost. Otherwise, the verifier must go proof hunting, which is
expencive, if not down right impossible in principle
(In the case of the AI quining itself it must produce a theorem about a piece
 of code. The only thing that makes the code non-arbitrary is the
 information given, and therefore the information must yield the proof.)

> > To put it less formally, we'd be giving our Friendliness module the use
> > of a genie which is somewhat unreliable and whose reliability in any
> > particular decision is, for all intents and purposes, difficult to check.
> True
Right. Doesn't this stike you as dangerous?

> > Knowledge of the AI proper's decision is not sufficient here. The Friendliness
> > module would also need the reasoning behind the decision in order to verify
> > it.
> You have not demonstrated this statement or even argued for it, you've
> just stated it...
See above. The log of deductions is what I meant by "the reasoning behind it."

> > However, if the AI proper has a non-verifiable architecture, this
> > knowledge may only exist in a crippled form, or, in extreme cases like
> > evolutionary programming, not at all.
> The idea that evolutionary programming is somehow "nontraceable" in
> its production of results is a misconception. Ev. prog. is a
> deterministic algorithm and if the intermediate steps in the evolution
> process are logged, then it is perfectly possible to mine these logs
> and understand why a given answer was arrived at. In fact, mining and
> analyzing such logs (albeit usually in relatively simplistic ways) is
> the principle underlying Estimation of Distribution Algorithms, an
> important subfield of ev. prog. today.

I never said evolutionary programming was "nontraceable". What I said was
"nonverifiable". I am not splitting hairs, as these are completely different
things. No program is nontraceable! You can just emulate the CPU and
observe the contents of the registers and memory at any point in time. With
that said though, you can probably see why this sort of traceability is not
good enough. What needs to be traceable is not the how the bits were
shuffled but how the conclusion reached was justified.

In the case of evolutionary programming, it should be clear we have the
former type of traceablity but not the latter type. Mining Evolutionary
Programming's population logs will never give you a why. The only why
evolution ever gives us is "it worked when I tried it."

> > The only rescue from this mess is to
> > make the "Friendliness module" smarter than the so-called AI itself.
> a) you have not demonstrated this or even argued for it, just stated it
> b) I don't know how you are defining "smarter than" in this statement...
Let me explain what I meant here.

What I meant was that when B proposes an action, A can either verify that B
did the correct thing or point out a flaw in B's choice.

The statment above is sufficient but not necessary to show that A is smarter
than B, in the colloquial sence of the phrase.

To illustrate, suppose Alice is helping Bob play chess. Whenever Bob
suggests a move, she always says something like "I agree with your move" or
"Yikes! If you go there, he'll fork your rooks in two moves! You overlooked
this move here." If she can always do this, it should be absolutely clear
that Alice is a better chess player than Bob.

This is why the Friendliness Module must be smarter than the AI itself.

However, suppose instead that whenever Bob suggests a move, he gives Alice
a pain-staking blow by blow explanation of why he did it starting from
propositions about chess that they both agree on (some of which can be
probabilistic statements, to avoid a brute force search.) This time
though, Alice doesn't have to be a chess pro to say whether Bob reasoned
correctly. She just has to say either "proof succeed" or "parse error on
line <number>". Alice might well be a relatively simple computer program

I hope I've clarified things sufficiently.
-=+Marcello Mathias Herreshoff

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:55 MDT