A difficulty with AI reflectivity

From: Eliezer Yudkowsky (sentience@pobox.com)
Date: Tue Oct 19 2004 - 09:02:27 MDT

I've recently been pondering a difficulty with self-modifying AI. I shall
illustrate with Juergen Schmidhuber's Gödel Machine, but the problem


Juergen Schmidhuber's self-rewriting "Gödel Machine" is based on the notion
of a machine that executes a self-rewrite as soon as it can produce a proof
that the self-rewrite is an improvement. This self-rewriting can extend to
rewriting the theorem prover or theorem checker. Leaving aside other
qualms I have about the Gödel Machine, it appears to me that for the Gödel
Machine to execute a change that switches to a new theorem-prover or
proof-verifier, the Gödel Machine must prove that if the proof-verifier
accepts P, then P. Otherwise there is no way for the Gödel Machine to
prove the new proof-verifier is useful, i.e., no way to expect results
accepted by the verifier to be useful.

Example: For the Gödel Machine's current theorem-prover to accept a
proposed new theorem-prover, might require establishing that if the new
theorem-prover accepts a proof that a proposed newer theorem-prover is
useful, then the newer theorem-prover is useful. Part of my
dissatisfaction with the Gödel Machine is that Schmidhuber does not specify
exactly what is meant by "useful", saying only that it has to do with
expected utility. But suppose we have an expected utility problem that
involves maximizing the number of received apples. To rewrite the current
proof system with a new proof system would presumably involve proving that
if the new proof system accepts a rewrite as "useful for increasing
apples", that rewrite is useful for increasing apples. The verifier has to
translate a meta-language result into an object-language result.

Löb's Theorem establishes that for any proof system capable of encoding its
own proof process,

|- Provable('P') -> P
   if and only if
|- P.

If you can prove "if P is provable then P", you can prove P.


In the same way that Gödel's Theorem formalizes the ancient Epimenides
Paradox "This sentence is false", Löb's Theorem sparked Haskell Curry to
invent the Santa Claus Paradox, "If this sentence is true then Santa Claus
exists." (Consider: *if* the sentence *were* true, then the antecedent
would be true, and hence Santa Claus would exist, right? So if the
sentence were true, Santa Claus would exist? This is precisely what the
sentence asserts, and therefore the sentence is true and Santa Claus does
exist.) Löb's Theorem eliminates the self-reference in the Santa Claus
Paradox via a diagonalization argument, as Gödel did with the Epimenides
Paradox. The upshot is that if you can prove "P's provability implies P",
the Santa Claus Paradox carries through and you can prove P. If you could
prove, in Peano Arithmetic, that a proof of P!=NP in Peano Arithmetic would
imply P!=NP, you could use Löb's construction to prove P!=NP.

Tarski used Löb's Theorem to argue that no consistent language can contain
its own truth predicate. If you want to talk about the truth of statements
in an object language, you need a meta-language that cannot talk about the
truth of the meta-language. You can say "The sky is blue" and '"The sky is
blue" is true', but not ''"The sky is blue" is true' is true', unless you
invent a new concept of truth, true-1, to describe metalanguage truths in


In more familiar terms, it would seem that Schmidhuber's Gödel Machine must
prove a new proof system is consistent in order to accept it, and that runs
smack dab into Gödel's original theorem. Any system that can prove its own
consistency is inconsistent.

Maybe the Gödel Machine's first proof system would start with ZFC set
theory. ZFC suffices to prove the consistency of Peano Arithmetic, and
might accept a rewrite implementing a proof verifier that accepted PA
proofs - but what about the next iteration of self-rewriting? The Peano
prover would not accept another Peano prover. The same problem holds for
starting off the Gödel Machine with an axiom about the consistency of Peano
Arithmetic, a further axiom about the consistency of the new system PA+1,
et cetera. Even if we assert infinite levels of consistency in PA+w, it
doesn't help. Just as every descending sequence of ordinals must
eventually terminate, every sequence of self-rewrites would need to
eventually terminate.

The difficulty with Schmidhuber's Gödel Machine concerns me because the
problem with Löb's Theorem seems to generalize to other foundations for
self-rewriting AI. Even the simplest consistency under reflection of a
self-modifying AI would seem to require the AI to believe that if its
processes assert P, if AI |- P, that is evidence supporting P. If the AI
looks at its reflected processes, it will find that the meaning of belief
P, the causal sequence leading up to the presence of the computational
tokens representing belief P, is that such-and-such process T proved P. If
T's proof of P doesn't seem to the AI to imply P, then why would a
reflective AI allow T's proof of P to load P as a belief, any more than a
reflective AI would allow a random number-generator to load beliefs? But
if an AI does believe that (AI |- P) -> P, Löb's construction permits an
automatic proof of P - unless we change one of the other implicit
assumptions in Löb's argument, or in this whole scenario.

All the papers I've scanned so far on reflectivity in theorem-proving
systems use the old towers-of-meta trick to get around Löb's Theorem. I
have not been able to find anything on reflectivity that wraps all the way
around, the way humans think (but then humans are naively vulnerable to
Epimenides and Santa Claus paradoxes), or the way a reflectively consistent
seed AI would have to wrap around.

Lest any despair of AI, remember that humans can think about this stuff
without going up in flames, ergo it must be possible somehow.

I have tentative ideas about how to resolve this problem, but I was
wondering if there's anything in the existing literature. Anything I
should read on fully wrap-around reflectivity? Anything out there on
reflectivity that doesn't invoke the towers-of-meta trick?

Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:49 MDT