Re: Friendliness not an Add-on

From: Marcello Mathias Herreshoff (
Date: Mon Feb 20 2006 - 18:05:26 MST

Because this message is getting long, I'm putting all my responses at the top
of the message with headers for our convenience.

--- Rice's Theorem
Here's the problem. If you didn't know what algorithm the AI proper was
using, and you had no log file, you would run up against Rice's theorem here.
However, we do know what algorithm the AI is using and we might have a log
file. These are the only things preventing us from running into Rice's theorem.

Therefore, this means there must exist some reasonably efficient translation
algorithm for our AI's algorithm which will take in a conclusion and
optionally a log file and output a deductive justification, which can then
be checked. But, this is precisely what I meant by verifiable, so you can't
do that for a non-verifiable architecture.

If when you said "verifiable" you meant something other than this, then
what did you mean?

--- Danger
A non-verifiable subsystem is more dangerous than a verifiable one, because a
non-verifiable subsystem would not necessarily be caught if it made a
mistake, whereas a verifiable subsystem would.

Further, if a mistake fell into the reasoning involved in constructing the
next version of the AI it could cause horrible damage.

If I were given a genie lamp which made a mistake 1/10 of the time, I'd lock
it in a box and hide it somewhere. A mistake could mean anything from
everyone losing all their left shoes to South America turning into a giant
apple strudel.

A mistake in the construction of the AI's next version is pretty much a
random genie wish.

--- Verifiability
Given all that's at stake, we must pick the very strongest criteria for
verifiability. Should it become apparent that these criteria are too strong
to make the AI possible to build, they should be scaled back accordingly.
However, unless or until that happens, we should only be content with the
very best.

--- Fluid Analogies
If I can't provide this level of justification for fluid analogies I won't
put them in. Should analogies really be one of the keystones of
intelligence, and not a special case of something more fundamental, I want to
understand precisely what they are and be able to see each link in the chain
reasoning involved in constructing them.

Until I do that, I won't have really understood what makes analogies tick,
and have no hope of writing a piece of code which rates them, let alone
finds good ones.

--- Chess
But, I chose the example of chess to show that Alice has to be just as
creative as Bob. Suppose, as you claim, that there's some brilliant move
that Bob will notice and Alice won't. Now, consider Bob's previous move.
Alice has to notice the brilliant move, or she won't be able to criticize
suboptimal moves that do not set up the situation that the really brilliant
move depended on.

Real life is a game where the AI gets to take more than one turn.

--- Conclusion
I hope I've stated my arguments here sufficiently clearly.

Remember that if you do end up building a powerful enough AI system, the
burden of proof regarding its safety lies with you. If you don't use a nicely
formalized architecture, this step looks way harder.

-=+Marcello Mathias Herreshoff

On Sun, Feb 19, 2006 at 07:37:48AM -0500, Ben Goertzel wrote:
> About Rice's theorem... Sorry, I did not phrase my argument against
> the relevance of this theorem very carefully. Here goes again.
> Hopefully this reformulation is sufficiently precise.
> What that theorem says (as you know) is that for any nontrivial
> property P (roughly: any property that holds for some arguments and
> not others) it is impossible to make a program that will tell you, for
> all algorithms A, whether A has property P.
> In other words, it says
> It is not true that:
> <snip to other message>
> For all nontrivial properties P
> { It is not true that
> { there exists a program Q so that
> { for all algorithms A
> { Q will tell you whether A has property P
> }}}}
> <end snip>
> But a Friendliness verifier does not need to do this. A Friendliness
> verifier just needs to verify whether
> * a certain class of algorithms A (the ones that it is plausibly
> likely the AI system in question will ultimately self-modify into)
> satisfy
> * a particular property P: Friendliness
> The existence of a Friendliness verifier of this nature is certainly
> not ruled out by Rice's Theorem.
> The problems are in formulating what is meant by Friendliness, and
> defining the class of algorithms A.
> A log of the complete history of an AI system is not necessary in
> order to define the plausible algorithm-class; this definition may be
> given potentially by a priori knowledge about the nature of the AI
> system in question.
(see Rice's Theorem)
> > > > To put it less formally, we'd be giving our Friendliness module the use
> > > > of a genie which is somewhat unreliable and whose reliability in any
> > > > particular decision is, for all intents and purposes, difficult to check.
> > > True
> > Right. Doesn't this stike you as dangerous?
> It strikes me as potentially but not necessarily dangerous -- it all
> depends on the details of the AI architecture.
> This is not the same as the "AI boxing" issue, in which the AI in the
> box is like a genie giving suggestions to the human out of the box.
> In that case, the genie is proposed to be potentially a sentient mind
> with its own goals and motivations and with a lot of flexibility of
> behavior. In the case I'm discussing, the "genie" is a
> hard-to-predict hypothesis-suggester giving suggestions to a logical
> cognition component controlled by a Friendliness verifier. And the
> hard-to-predict hypothesis-suggester does not not need to be a
> sentient mind on its own: it does not need flexible goals,
> motivations, feelings, or the ability to self-modify in any general
> way. It just needs to be a specialized learning component, similar in
> some ways to Eliezer's proposed Very Powerful Optimization Process
> used for world-simulation inside his Collective Volition proposal (I'm
> saying that it's similar in being powerful at problem-solving without
> having goals, motivations, feelings or strong self-modification; of
> course the problem being solved by my hard-to-predict
> hypothesis-suggester (hypothesis generation) is quite different than
> the problem being solved by Eliezer's VPOP (future prediction)).
(See Danger)

> > I never said evolutionary programming was "nontraceable". What I said was
> > "nonverifiable". I am not splitting hairs, as these are completely different
> > things. No program is nontraceable! You can just emulate the CPU and
> > observe the contents of the registers and memory at any point in time. With
> > that said though, you can probably see why this sort of traceability is not
> > good enough. What needs to be traceable is not the how the bits were
> > shuffled but how the conclusion reached was justified.
> a)
> You have not presented any argument as to why verifiability in this
> sense is needed for Friendliness verification.
> b)
> Your criterion of verifiability seems to me to be unreasonably strong,
> and to effectively rule out all metaphorical and heuristic inference.
> But maybe I have misunderstood your meaning.
(see Verifiability)

> Please consider the following scenario.
> Suppose we have a probabilistic-logical theorem-proving system, which
> arrives at a conclusion. We can then trace the steps that it took to
> arrive at this conclusion. But suppose that one of these steps was a
> metaphorical ANALOGY, to some other situation -- a loose and fluid
> analogy, of the sort that humans make all the time but current AI
> reasoning software is bad at making (as Douglas Hofstadter has pointed
> out in detail).
> Then, it seems to me that what your verifiability criterion demands is
> not just that the conclusion arrived at through metaphorical analogy
> be checked for correctness and usefulness -- but that a justification
> be given as to why *that particular analogy* was chosen instead of
> some other one.
> This means that according to your requirement of verifiability (as I
> understand it) a stochastic method can't be used to grab one among
> many possible analogies for handling a situation. Instead, according
> to your requirement, some kind of verifiable logical inference needs
> to be used to choose the possible analogy.
> In Novamente, right now, the way this kind of thing would be handled
> would be (roughly speaking):
> a) a table would be made of the possible analogies, each one
> quantified with a number indicating its contextual desirability
> b) one of the analogies would be chosen from the table, with a
> probability proportional to the desirability number
> According to your definition of verifiability this is a bad approach
> because of the use of a stochastic selection mechanism in Step b.
> However, I have my doubts whether it is really possible to achieve
> significant levels of general intelligence under severely finite
> resources without making this kind of stochastic selection in one form
> or another. (I'm not claiming it is necessary to resort to
> pseudorandom number generation; just that I suspect it's necessary to
> resort to something equally arbitrary for selecting among options in
> cases where there are many possibly relevant pieces of knowledge in
> memory and not much information to go on regarding which one to use in
> a given inference.)
(see Fluid Analogies)
> > What I meant was that when B proposes an action, A can either verify that B
> > did the correct thing or point out a flaw in B's choice.
> >
> > The statment above is sufficient but not necessary to show that A is smarter
> > than B, in the colloquial sence of the phrase.
> I find this interpretation of the "smarter" concept very inadequate.
> For instance, suppose I have a collaborator who is more reliable in
> judgment than me but less creative than me. For sake of concretness,
> let's call this individual by the name "Cassio."
> Let A=Ben, B=Cassio
> Now, it may be true that "When Ben proposes an action, Cassio can
> either verify that Ben proposed the correct thing, or point out a flaw
> in his choice"
> This does not necessarily imply that Cassio is smarter than Ben -- it
> may be that Ben is specialized for hypothesis generation and Cassio is
> specialized for quality-verification.
> The colloquial notion of "smartness" is not really sufficient for
> discussing situations like this, IMO.
(see Chess)
> > To illustrate, suppose Alice is helping Bob play chess. Whenever Bob
> > suggests a move, she always says something like "I agree with your move" or
> > "Yikes! If you go there, he'll fork your rooks in two moves! You overlooked
> > this move here." If she can always do this, it should be absolutely clear
> > that Alice is a better chess player than Bob.
> Yes, but I can also imagine two chess masters, where master A was
> better at coming up with bold new ideas and master B was better at
> pointing out subtle flaws in ideas (be they bold new ones or not).
> These two masters, if they were able to cooperate very closely (e.g.
> through mental telepathy), might be able to play much better than
> either one on their own. This situation is more like the one at hand.
> (i.e., I think your internal quasirandom selection mechanism has
> chosen a suboptimal analogy here ;-)
(see Chess)
> This discussion has gotten fairly in-depth, but the crux of it is, I
> don't feel you have made a convincing argument in favor of your point
> that it is implausible-in-principle to add Friendliness on to an AGI
> architecture designed without a detailed theory of Friendliness on
> hand. I don't feel Eliezer has ever made a convincing argument in
> favor of this point either. It may be true but you guys seem far from
> demonstrating it...
(see Conclusion)

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:55 MDT