Re: Proving the Impossibility of Stable Goal Systems

From: Ben Goertzel (
Date: Sun Mar 05 2006 - 17:20:52 MST


I think your general line of argument is probably right.

However, it seems to me that a rigorous proof of your argument is
going to be verrrry difficult to come by.

Look at how much math Marcus Hutter had to generate to prove some
theorems that amount to, in essence, the statement that "given any
computable goal, with near-infinite processing power and memory one
can make a software program that will achieve this goal just about as
well as is possible." Intuitively this is almost obvious but hundreds
of pages of hard math had to be done to prove it rigorously. And this
is obviously much simpler than the intuition you've expressed
regarding Friendliness of AI systems with finite resources.

I feel that our current mathematical concepts are very ill-suited to
proving theorems of this nature. Hutter's work is great but it
reminds me of things like

* a 50-page geometric derivation of the formula for the slope of a
cubic polynomial, done before Newton and Leibniz invented the rules of

* old-fashioned differential geometry calculations, done using
coordinate geometry and endless elementary algebra -- which is how
this stuff was done back before the invention of manifold theory

In each of these cases, long and nasty proofs were replaced by short
elegant ones after the correct set of new concepts was introduced.

In the case of issues regarding AGI, Friendliness and
self-modification and the like, I think we don't yet have the correct
set of new concepts. Once we do, some analogue of Hutter's theorems
will be provable almost trivially, and quite possibly some analogue of
your hypothesis about the difficulty of guaranteeing Friendliness for
AGI systems under finite resources will also be provable (though
probably not quite as trivially).

An interesting topic is what these new concepts might be. Of course
they will end up being related to current theoretical notions like
probability and algorithmic information, but I expect they will
involve a fundamentally different perspective as well.

-- Ben G

On 3/5/06, Eliezer S. Yudkowsky <> wrote:
> Peter Voss wrote:
> > Eli,
> >
> > Have you seriously considered putting focused effort into proving that
> > practical self-modifying systems can *not* have predictably stable goal
> > systems?
> >
> > I don't recall specific discussion on that point.
> I often consider that problem; if I could prove the problem impossible
> in theory I would probably be quite close to solving it in practice. My
> attention tends to focus on Godelian concerns, though I think of them as
> "Lobian" after Lob's Theorem.
> > I strongly suspect that such a proof would be relatively simple.
> > (Obviously, at this stage you don't agree with this sentiment).
> >
> > Naturally the implication for SIAI (and the FAI/AGI community in
> > general) would be substantial.
> Please go right ahead and do it; don't let me stop you!
> > - Any practical high-level AGI has to use its knowledge to interpret
> > (and question?) its given goals
> Any AGI must use its model of the real world to decide which real-world
> actions lead to which real-world consequences, and evaluate its utility
> function or other decision system against the model's predicted
> real-world consequences.
> This does *not* necessarily involve changing the utility function.
> > - Such a system would gain improved knowledge from interactions with the
> > real world. The content of this knowledge and conclusions reached by the
> > AGI are not predictable.
> Agreed.
> > - By the nature of its source of information, much knowledge would be
> > based on induction and/or statistics, and be inherently fallible.
> Also agreed.
> Please note however, that accidentally killing a human is not a
> catastrophic failure. One evil deed does not turn the FAI evil, like a
> character in a bad movie. *Catastrophic* failures, which change what
> the FAI is *trying* to do, require that the FAI fail on the task of
> recursive self-improvement.
> So an impossibility proof would have to say:
> 1) The AI cannot reproduce onto new hardware, or modify itself on
> current hardware, with knowable stability of the decision system (that
> which determines what the AI is *trying* to accomplish in the external
> world) and bounded low cumulative failure probability over many rounds
> of self-modification.
> or
> 2) The AI's decision function (as it exists in abstract form across
> self-modifications) cannot be knowably stably bound with bounded low
> cumulative failure probability to programmer-targeted consequences as
> represented within the AI's changing, inductive world-model.
> If I could rigorously prove such an impossibility, my understanding
> would probably have advanced to the point where I could also go ahead
> and pull it off in practice.
> --
> Eliezer S. Yudkowsky
> Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT