Re: Friendliness not an Add-on

From: Ben Goertzel (
Date: Mon Feb 20 2006 - 18:27:17 MST


> Here's the problem. If you didn't know what algorithm the AI proper was
> using, and you had no log file, you would run up against Rice's theorem here.
> However, we do know what algorithm the AI is using and we might have a log
> file. These are the only things preventing us from running into Rice's theorem.


> Therefore, this means there must exist some reasonably efficient translation
> algorithm for our AI's algorithm which will take in a conclusion and
> optionally a log file and output a deductive justification, which can then
> be checked. But, this is precisely what I meant by verifiable, so you can't
> do that for a non-verifiable architecture.

I don't find the language or reasoning in the above paragraph
sufficiently clear.

Let me try one more time to spell out the specific sort of situation I
have been trying to discuss.

Suppose we have a system with a goal G, and suppose it arrives at a
theorem Q of the form

Q = "Executing subprogram S starting at a time point in interval T
will achieve goal G with probability at least p, based on
knowledge-base K"

Suppose it proves this theorem Q using a proved-to-be-correct
theorem-proving subsystem.

Suppose that the goal G embodies within itself an agreeable notion of

Now, suppose that the theorem Q was found via a hard-to-predict
algorithm such as some variant of evolutionary programming, or some
sort of heuristic, abductive inference train involving metaphorical
leaps, etc.

What is your claim about this kind of AI architecture?

Are you claiming that it cannot be proved Friendly, even if the
algorithmic information of the whole thing is less than that of the
agent doing the proof?

I don't see why. It seems to me that one might be able to prove this
kind of system is reasonably able to achieve the goal G, as compared
to other sorts of AI systems operating with similar amounts of
computational resources.

In fact, my conjecture is that given a certain finite amount of
computational resources, the best AI systems involving hard-to-predict
hypothesis-generation subsystems will be *better* than the best AI
systems that don't involve such ---- thus suggesting that the
Friendliest AI's (best able to optimize the Friendliness goal G)
constructible given a certain amount of computational resources may be
the ones involving hard-to-predict hypothesis-generation subsystems.

> Remember that if you do end up building a powerful enough AI system, the
> burden of proof regarding its safety lies with you. If you don't use a nicely
> formalized architecture, this step looks way harder.

I agree with this statement, however, it is not the case that using a
hard-to-predict hypothesis-generation subsystem implies using a
non-formalized or non-nicely-formalized architecture. Evolutionary
programming and estimation of distribution algorithms, for example,
are quite thoroughly formalized.

-- Ben G

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:55 MDT