Re: Definition of strong recursive self-improvement

From: Russell Wallace (russell.wallace@gmail.com)
Date: Sun Jan 02 2005 - 00:09:17 MST


On Sat, 01 Jan 2005 22:48:31 -0600, Eliezer S. Yudkowsky
<sentience@pobox.com> wrote:
> It seems to me that you have just proved that Marcus Hutter's AIXI can
> be no smarter than a human

AIXI can't be any smarter than a rock, if run on physically feasible
hardware. There's a reason AIXI et al are PDFware rather than running
code.

> when AIXI could tear apart a human like
> tinfoil. We can specify computations which no human mind nor physically
> realizable computer can run, yet which, if they were computed, would
> rule the universe.

It's easy to say "tear apart a human like tinfoil" and "would rule the
universe" - is there any stronger basis than intuition for believing
AIXI could compete with human performance at practical real-world
tasks even if it had an infinitely powerful computer to run on?

Mind you, my claim isn't about what could be done in the limit of
infinite computing power; it's about what could be done with a mere
few million exaflops - about the limit of what we can expect to get
from nanotech supercomputers. With that sort of physically possible
hardware, the sort of proof by exhaustive search that AIXI et al rely
on is completely infeasible.

But even if you had infinite computing power, the very notion of
formal proof relies on a formal specification, so the results could be
no better than said specification.

> I only partially understand - I am presently working on understanding -
> how humans write code without either simulating every step of every
> possible run of the program, nor employing contemporary slow
> theorem-proving techniques. Nonetheless it is evident that we write
> code. Your proof against recursive self-improvement, which denies even
> the first step of writing a single line of functioning code, is equally
> strong against the existence of human programmers.

Not at all. It is, however, a proof that human programmers can't write
_guaranteed correct_ code. And indeed we see that is the case: code
written by human programmers is notoriously unreliable, must be
extensively tested before even partial confidence is placed in it, and
is never regarded as 100% trustworthy.

An AI will be in the same boat when it tries to improve a program:
sure, it might get some modifications right, but because it can't be
100% sure, it will get some of them wrong. And if the program it's
trying to improve is itself, with no restrictions on what parts it can
modify, then some errors will be impossible to recover from, because
the ability they will degrade will be the ability to recover from
errors. (This isn't pure speculation - in the one case where it was
put to the test, we note that EURISKO ran into exactly the sort of
problems I describe.)

> I intend to comprehend how it is theoretically possible that humans
> should write code, and then come up with a deterministic or
> calibrated-very-high-probability way of doing "the same thing" or
> better. It is not logically necessary that this be possible, but I
> expect it to be possible.

I can explain that one for you. Humans write code the same way we do
other things: a combination of semi-formal reasoning and empirical
testing.

By "semi-formal reasoning", I mean where we think things along the
lines of "A -> B", ignoring the fact that in strict logic it would
read "A &!C &!D &!E ... -> B", where C, D, E and an indefinitely long
list of other things we haven't thought of, could intervene to stop A
from working. We ignore C, D, E etc because if we had to stop to take
them all into account, we'd starve to death in our beds because we
couldn't prove it was a good idea to get up in the morning. In
practice, we're good enough at taking into account only those things
that are important enough to make a practical difference, that we can
survive in the real world.

This is the "frame problem" of classical AI - how to get a computer to
reliably take into account those things that it needs to take into
account, and ignore those things that it's adequately safe to ignore.
Now, like you I believe it should be possible to solve the frame
problem well enough to get into the ballpark of human performance. But
it can't be possible to _absolutely_ solve it, because the whole
_point_ of the frame problem is that it's a problem of how to _give up
on_ the idea of _absolutely_ solving problems.

And indeed, we find humans are fallible. We make mistakes. We come up
with plans that fail, we write programs that crash, we go "oops, I
didn't think of that". And unlike our propensity to make mistakes in
arithmetic (which is just because our brains aren't designed for it),
our propensity to make mistakes in general problem solving is inherent
in the nature of things. An AI can't be infallible any more than we
can.

Of course, an AI doesn't have to be infallible to be useful. An AI
that writes buggy code that still needs testing, could still be useful
if, for example, its code wasn't _as_ buggy as that produced by human
programmers.

Until you look at recursive self-improvement... and note that if the
code you're modifying is yourself, then it's too late to go "oops, I
didn't thought of that, I'll do a patch", when the bug has destroyed
your ability to do the patch.

- Russell



This archive was generated by hypermail 2.1.5 : Tue Feb 21 2006 - 04:22:50 MST