Re: Friendliness not an Add-on

From: Ben Goertzel (ben@goertzel.org)
Date: Sun Feb 19 2006 - 07:12:13 MST


> What that theorem says (as you know) is that for any nontrivial
> property P (roughly: any property that holds for some arguments and
> not others) it is impossible to make a program that will tell you, for
> all algorithms A, whether A has property P.
>
> In other words, it says
>
> It is not true that:
> { There exists a program so that
> {For All nontrivial properties P and all algorithms A
> { there exists a program Q that will tell you whether A has property P
> }}}

Sorry, this should have been

For all nontrivial properties P
{ It is not true that
{ there exists a program Q so that
{ for all algorithms A
{ Q will tell you whether A has property P
}}}}

But it still holds that a Friendliness verifier does not need to do
this, because it only needs to verify the Friendliness-property for a
limited subset of algorithms A, which is constrained by the AI
architecture in question. (And this constraint may be expressed by a
priori knowledge of the AI architecture, not necessarily via system
logs, though the latter may contribute.)

On a related note...

I do think there is a possible fundamental algorithmic problem here
related to Friendliness , but I don't see one that is specific to AI
architectures that couple hard-to-predict hypothesis generation
subsystems with verifier subsystems.

Of course, there is the following problem: If one has an AI system
that is able to self-improve via adding new physical resources to
itself as well as revising its code, THEN the future algorithmic
information of this AI system may vastly exceed the algorithmic
information of the initial version, plus the algorithmic information
of the human society creating the AI system and all its other
computers, etc. In this case, it would seem there is fundamentally no
way for the human AI-creators and the initial AI to prove the
Friendliness of the future AI system, because of the "10 pound formal
system can't prove a 20 pound theorem" problem. (In the language of
the above discussion of Rice's Theorem, when one allows the addition
of a lot of new physical compute power, this makes it too hard to
delimite the class of possible "algorithms" to be verified in
advance.)

In other words: if we allow our self-modifying AI's to increase their
algorithmic information vastly beyond our own, then we can in
principle not prove their Friendliness.

However, this problem applies no matter what AI architecture we use,
it seems to me.... It seems to apply unless the
conceptual/mathematical framework of algorithmic information theory is
not applicable for some reason (which of course is possible because we
live in a specific, possibly finite physical universe and the
application of such abstract mathematical theories to physical reality
requires various subtle assumptions).

-- Ben



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:55 MDT