Re: Proving the Impossibility of Stable Goal Systems

From: Eliezer S. Yudkowsky (
Date: Sun Mar 05 2006 - 16:55:37 MST

Peter Voss wrote:
> Eli,
> Have you seriously considered putting focused effort into proving that
> practical self-modifying systems can *not* have predictably stable goal
> systems?
> I don't recall specific discussion on that point.

I often consider that problem; if I could prove the problem impossible
in theory I would probably be quite close to solving it in practice. My
attention tends to focus on Godelian concerns, though I think of them as
"Lobian" after Lob's Theorem.

> I strongly suspect that such a proof would be relatively simple.
> (Obviously, at this stage you don't agree with this sentiment).
> Naturally the implication for SIAI (and the FAI/AGI community in
> general) would be substantial.

Please go right ahead and do it; don't let me stop you!

> - Any practical high-level AGI has to use its knowledge to interpret
> (and question?) its given goals

Any AGI must use its model of the real world to decide which real-world
actions lead to which real-world consequences, and evaluate its utility
function or other decision system against the model's predicted
real-world consequences.

This does *not* necessarily involve changing the utility function.

> - Such a system would gain improved knowledge from interactions with the
> real world. The content of this knowledge and conclusions reached by the
> AGI are not predictable.


> - By the nature of its source of information, much knowledge would be
> based on induction and/or statistics, and be inherently fallible.

Also agreed.

Please note however, that accidentally killing a human is not a
catastrophic failure. One evil deed does not turn the FAI evil, like a
character in a bad movie. *Catastrophic* failures, which change what
the FAI is *trying* to do, require that the FAI fail on the task of
recursive self-improvement.

So an impossibility proof would have to say:

1) The AI cannot reproduce onto new hardware, or modify itself on
current hardware, with knowable stability of the decision system (that
which determines what the AI is *trying* to accomplish in the external
world) and bounded low cumulative failure probability over many rounds
of self-modification.


2) The AI's decision function (as it exists in abstract form across
self-modifications) cannot be knowably stably bound with bounded low
cumulative failure probability to programmer-targeted consequences as
represented within the AI's changing, inductive world-model.

If I could rigorously prove such an impossibility, my understanding
would probably have advanced to the point where I could also go ahead
and pull it off in practice.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT