# Re: Friendliness not an Add-on

From: Ben Goertzel (ben@goertzel.org)
Date: Sun Feb 19 2006 - 05:37:48 MST

Hi,

About Rice's theorem... Sorry, I did not phrase my argument against
the relevance of this theorem very carefully. Here goes again.
Hopefully this reformulation is sufficiently precise.

What that theorem says (as you know) is that for any nontrivial
property P (roughly: any property that holds for some arguments and
not others) it is impossible to make a program that will tell you, for
all algorithms A, whether A has property P.

In other words, it says

It is not true that:
{ There exists a program so that
{For All nontrivial properties P and all algorithms A
{ there exists a program Q that will tell you whether A has property P
}}}

But a Friendliness verifier does not need to do this. A Friendliness
verifier just needs to verify whether

* a certain class of algorithms A (the ones that it is plausibly
likely the AI system in question will ultimately self-modify into)

satisfy

* a particular property P: Friendliness

The existence of a Friendliness verifier of this nature is certainly
not ruled out by Rice's Theorem.

The problems are in formulating what is meant by Friendliness, and
defining the class of algorithms A.

A log of the complete history of an AI system is not necessary in
order to define the plausible algorithm-class; this definition may be
given potentially by a priori knowledge about the nature of the AI
system in question.

> > > To put it less formally, we'd be giving our Friendliness module the use
> > > of a genie which is somewhat unreliable and whose reliability in any
> > > particular decision is, for all intents and purposes, difficult to check.
> > True
> Right. Doesn't this stike you as dangerous?

It strikes me as potentially but not necessarily dangerous -- it all
depends on the details of the AI architecture.

This is not the same as the "AI boxing" issue, in which the AI in the
box is like a genie giving suggestions to the human out of the box.
In that case, the genie is proposed to be potentially a sentient mind
with its own goals and motivations and with a lot of flexibility of
behavior. In the case I'm discussing, the "genie" is a
hard-to-predict hypothesis-suggester giving suggestions to a logical
cognition component controlled by a Friendliness verifier. And the
hard-to-predict hypothesis-suggester does not not need to be a
sentient mind on its own: it does not need flexible goals,
motivations, feelings, or the ability to self-modify in any general
way. It just needs to be a specialized learning component, similar in
some ways to Eliezer's proposed Very Powerful Optimization Process
used for world-simulation inside his Collective Volition proposal (I'm
saying that it's similar in being powerful at problem-solving without
having goals, motivations, feelings or strong self-modification; of
course the problem being solved by my hard-to-predict
hypothesis-suggester (hypothesis generation) is quite different than
the problem being solved by Eliezer's VPOP (future prediction)).

> I never said evolutionary programming was "nontraceable". What I said was
> "nonverifiable". I am not splitting hairs, as these are completely different
> things. No program is nontraceable! You can just emulate the CPU and
> observe the contents of the registers and memory at any point in time. With
> that said though, you can probably see why this sort of traceability is not
> good enough. What needs to be traceable is not the how the bits were
> shuffled but how the conclusion reached was justified.

a)
You have not presented any argument as to why verifiability in this
sense is needed for Friendliness verification.

b)
Your criterion of verifiability seems to me to be unreasonably strong,
and to effectively rule out all metaphorical and heuristic inference.
But maybe I have misunderstood your meaning.

Suppose we have a probabilistic-logical theorem-proving system, which
arrives at a conclusion. We can then trace the steps that it took to
arrive at this conclusion. But suppose that one of these steps was a
metaphorical ANALOGY, to some other situation -- a loose and fluid
analogy, of the sort that humans make all the time but current AI
out in detail).

Then, it seems to me that what your verifiability criterion demands is
not just that the conclusion arrived at through metaphorical analogy
be checked for correctness and usefulness -- but that a justification
be given as to why *that particular analogy* was chosen instead of
some other one.

This means that according to your requirement of verifiability (as I
understand it) a stochastic method can't be used to grab one among
many possible analogies for handling a situation. Instead, according
to your requirement, some kind of verifiable logical inference needs
to be used to choose the possible analogy.

In Novamente, right now, the way this kind of thing would be handled
would be (roughly speaking):

a) a table would be made of the possible analogies, each one
quantified with a number indicating its contextual desirability

b) one of the analogies would be chosen from the table, with a
probability proportional to the desirability number

because of the use of a stochastic selection mechanism in Step b.

However, I have my doubts whether it is really possible to achieve
significant levels of general intelligence under severely finite
resources without making this kind of stochastic selection in one form
or another. (I'm not claiming it is necessary to resort to
pseudorandom number generation; just that I suspect it's necessary to
resort to something equally arbitrary for selecting among options in
cases where there are many possibly relevant pieces of knowledge in
memory and not much information to go on regarding which one to use in
a given inference.)

> What I meant was that when B proposes an action, A can either verify that B
> did the correct thing or point out a flaw in B's choice.
>
> The statment above is sufficient but not necessary to show that A is smarter
> than B, in the colloquial sence of the phrase.

I find this interpretation of the "smarter" concept very inadequate.

For instance, suppose I have a collaborator who is more reliable in
judgment than me but less creative than me. For sake of concretness,
let's call this individual by the name "Cassio."

Let A=Ben, B=Cassio

Now, it may be true that "When Ben proposes an action, Cassio can
either verify that Ben proposed the correct thing, or point out a flaw
in his choice"

This does not necessarily imply that Cassio is smarter than Ben -- it
may be that Ben is specialized for hypothesis generation and Cassio is
specialized for quality-verification.

The colloquial notion of "smartness" is not really sufficient for
discussing situations like this, IMO.

> To illustrate, suppose Alice is helping Bob play chess. Whenever Bob
> suggests a move, she always says something like "I agree with your move" or
> "Yikes! If you go there, he'll fork your rooks in two moves! You overlooked
> this move here." If she can always do this, it should be absolutely clear
> that Alice is a better chess player than Bob.

Yes, but I can also imagine two chess masters, where master A was
better at coming up with bold new ideas and master B was better at
pointing out subtle flaws in ideas (be they bold new ones or not).
These two masters, if they were able to cooperate very closely (e.g.
through mental telepathy), might be able to play much better than
either one on their own. This situation is more like the one at hand.

(i.e., I think your internal quasirandom selection mechanism has
chosen a suboptimal analogy here ;-)

This discussion has gotten fairly in-depth, but the crux of it is, I
don't feel you have made a convincing argument in favor of your point
that it is implausible-in-principle to add Friendliness on to an AGI
architecture designed without a detailed theory of Friendliness on
hand. I don't feel Eliezer has ever made a convincing argument in
favor of this point either. It may be true but you guys seem far from
demonstrating it...

-- Ben

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:55 MDT