Controlled ascent (was: Military Friendly AI)

From: Eliezer S. Yudkowsky (
Date: Sat Jun 29 2002 - 06:43:43 MDT

James Higgins wrote:
> At 06:32 PM 6/28/2002 -0600, Ben Goertzel wrote:
>> Novamente does not yet have a goal system at all, this will be
>> implemented, at my best guess, perhaps at the very end of 2002 or the
>> start of 2003. Currently we are just testing various cognitive and
>> perceptual mechanisms, and not yet experimenting with autonomous
>> goal-directed behavior.
> Boy, you really don't have much chance of a hard takeoff yet.

As you know, I think the threshold for hard takeoff is higher than Ben does,
and that the approach embodied in the Novamente manuscript I read will never
get there. So what? I could be wrong on both counts. Any system with
Turing-complete patterns optimizing themselves (that includes Eurisko)
should have a controlled ascent mechanism. You shouldn't stop and try to
argue with yourself about whether a controlled ascent mechanism is needed;
you should just *do* it. Basic errors in your visualization of how and why
a hard takeoff occurs is *exactly* what a controlled ascent mechanism is
intended to guard against, so arguing about whether a given system has a
significant chance of a hard takeoff misses the entire point. The point
where you think a system has a real, pragmatic chance of a hard takeoff is
the point where a full Friendship design should be implemented, not the
point where you first add on a controlled ascent mechanism.

>> A failsafe mechanism has two parts

Calling this a "failsafe" mechanism begs the question, because it most
certainly isn't. It just buys you a better chance. That's all.

>> 1) a basic mechanism for halting the system and alerting appropriate
>> people when a "rapid rate of intelligence increase" is noted

Personally, I would recommend that the system should pause when a certain
number of self-improvements go past unobserved. If that many improvements
mount up while the programmers take a two-week vacation, or over five
minutes, it shouldn't make a difference - what matters is that the
programmers don't see it. For whatever metric you use, measure how much
improvement occurs in an average day, or on a good day for that matter. At
the end of each day, have a programmer hit a sense switch that validates all
improvements up to one half-hour ago. If, say, 30 days worth of
self-improvement go past before the next hitting of the sense switch, the
system pauses and sends out an email, or shuts down (preserving state) if
somehow the activity continues. (Incidentally, the "send an email" part
only works if your system is connected to the Internet, but I understand
that's your plan.)

>> 2) a mechanism for detecting a rapid rate of intelligence increase

You can't have Turing-complete patterns optimizing themselves without a
definition of optimization. Whatever criterion is being used to separate
good patterns from bad patterns, use that criterion as the metric of
intelligence. If the patterns make improvements that they themselves can't
detect, the improvements are likely to be discarded. Unless the system
mechanism or particular patterns that detect "good tweaks" have some kind of
power granted by the overall system - the power to make more tweaks than
other patterns, or to have their tweaks granted a greater chance of survival
- it seems very unlikely that even a mathematical chance of a hard takeoff
could exist, unless one of the heuristics simultaneously learns how to make
intelligence improvements and cheat the system to enforce them "outside the
box", which is unlikely, especially given the amount of time you are likely
to spend preventing cheating in any case. An exception to this rule would
be a sufficiently vast Internet pool of fractal self-modifying codelets,
with no rules except Core Wars; in this case intelligence may simultaneously
be the means of making improvements and the means of enforcing them. That's
why it would be very hard to create a controlled ascent mechanism for
certain brute-force ways of searching for general intelligence.

Anyway, the upshot is that even the mathematical possibility of a hard
takeoff - I am not talking about "significant probabilities" but
"theoretical mathematical possibility" - is directly related to the system's
ability to measure goodness. So you plug in whatever measure of goodness
would be used to *feed* a hard takeoff, in the first place, into the
controlled ascent mechanism.

You also, when the time comes, measure a bunch of other correlates of
intelligence. This doesn't guarantee catching a hard takeoff that occurs
"outside the box" but it gives you a much higher chance of doing so, as long
as the takeoff "outside the box" still has any effect at all on the
cognitive systems you know (which it might not).

>> 1 is easy; 2 is hard ... there are obvious things one can do, but since
>> we've never dealt with this kind of event before, it's entirely
>> possible that a "deceptive intelligence increase" could come upon us.
>> Measuring general intelligence is tricky.

>> It depends on the situation. Of course, egoistic considerations of
>> priority are not a concern. But there's no point in delaying the
>> Novamente-induced Singularity by 3 years to reduce risk from 4% to 3%,
>> if in the interim some other AI team is going to induce a Singularity
>> with a 33.456% risk...
> Excellent answer. The best course of action would be to stop the team
> with the 33% risk of failure (at any cost I'd say given that number). But
> if they could not be stopped I'd endorse starting a less risky
> Singularity as an alternative.

James Higgins, I suggest you read "Policy recommendations" in CFAI, where
this kind of thinking is analyzed at length.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT