Goertzel's _PtS_

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed May 02 2001 - 14:09:41 MDT

Okay, I've read the entire "Path to Singularity". (The most interesting
part was definitely the JPEGs and lists that show the functional
decomposition of Webmind's major modules.)

That said...

Ben Goertzel wrote:
> the notion of Friendly AI the creation of AI systems that, as they
> rewrite their own source code achieving progressively greater and
> greater intelligence, leave invariant the portion of their code
> requiring them to be friendly to human beings

No offense - but no, *no*, NO, *NO*!

Point the 1st: Friendliness is not, and cannot, be implemented on the
level of source code. Friendliness is cognitive content.

Point the 2nd: Friendliness is not a "portion" which "requires" an AI to
be friendly to humans. Friendliness is not an add-on or a plug-in.
Friendliness is the whole of the goal system. It is what the AI wants to

Point the 3rd: Friendliness is not "invariant" - a strange term to use
for a system one of whose first and foremost recommendations is that
supergoals should be probabilistic! There's an entire section of FAI
("Shaper/anchor semantics") devoted to describing how the AI acquires the
ability to refine, extend, and correct existing content, create new
content, or - in extreme cases - junk the entire goal-system architecture
and start over. Because at least one of us (Ben and I) is really mistaken
here, or both of us are, and an AI needs to be able to handle even that
class of mistakes. A Friendly AI needs to be a human-equivalent
philosopher; or rather, to have a cognitive structure which permits
human-equivalent open-endedness, so that it can later acquire
human-equivalent or transhuman ability in the domain of philosophy.

Friendliness can't be ensured by creating an enslaved AI that lacks the
capability to alter the goal system; Friendliness is ensured by creating a
Friendly AI that doesn't *want* to stop being Friendly, just as I don't
want to stop being a nice person.

What is "invariant" is the fact that the AI is, on the whole, benevolent -
and even that should never be more "invariant" than our surety that what
humanity needs is, in fact, a benevolent AI. I mean, I'm pretty sure of
this, but humans can't be *100%* sure of *anything*.

Point the 4th: Friendliness is not "hardwired", a term which I've seen
you use several times. If a well-designed Friendly AI estimates you have
an a-priori 10% chance of being wrong about something, then no amount of
screaming and crying and ultra-strength affirmations on the programmer's
part will raise the estimated probability above 90% - because all the
screaming does is tell the AI that you really care, that you have a very
strong emotional commitment. Well, that's sensory information, and there
are ways in which that could affect actions - it could indicate an
injunction, an ethical heuristic, an anchor, and so on - but it still
isn't going to raise the probability to 100%, even if it gets the AI to go
along with you as a wait-and-see interim measure. To get anything beyond
"90%" and a "wait-and-see" compromise, you'd have to violate Friendliness
structure and start tampering directly - which might work for an infant
AI, but is unlikely to work for any seed AI with a good sense of itself,
and would certainly be corrected in retrospect by a transhuman AI.

> My sense is that he views self-modification as entering into the picture
> earlier, perhaps in Stage 1, as the best way of getting to the first
> "fairly intelligent AI." I'm not 100% sure this is wrong, but after a
> lot of thought I have not seen a good way to do this, whereas I have a
> pretty clear picture of how to get to the Singularity according to the
> steps I've outlined here.

Actually, this is more of a fundamental statement about the attitude and
philosophy of seed AI - that, as soon as the system has any intelligence
at all, no matter how primitive, that intelligence can and should be used
as a tool to build better systems, and that this applies on all layers of
the system, right down to any source code that you can teach the system to
comprehend. However, if you just build a general intelligence and only
start using it as a self-modification tool after it's been around for a
few years, that'll work too. I just think it'll be slower.

The main part of the model where I disagree with you is that it'll take a
lot more than a Java supercompiler description to give a general
intelligence humanlike understanding of source code. The Java
supercompiler description is only the very first step. Consider: The
very top layer of the human retina may be able to process all visual
fields equally well, including those composed of random pixels, but the
very next layer of the retina will only work for pictures with edges and
continuous color changes, and the visual cortex only works for
understanding 3D moving objects - definitely not all possible visual
fields. A Java supercompiler is a description that works for *all
possible Java code*, not just *useful* or *purposeful* Java code, and is
therefore analogous to the lowest possible layer of the modality.

> After it achieves a significant level of practical software engineering
> experience and mathematical and AI knowledge, it is able to begin
> improving itself ... at which point the hard takeoff begins.

You're quite right that the the takeoff to superintelligence may take
years from the point where Java code becomes munchable (if not readable).
It certainly won't take minutes. The point of hard takeoff is not when
the system *first starts* improving itself, but when the system finally
*does* make a breakthrough that leads to further breakthroughs that lead
to further breakthroughs and so on, continuing indefinitely, or at least
until human intelligence has been considerably transcended. There might
be a long pattern of short-term breakthroughs and long bottlenecks before
then, quite possibly going on for years.

What I'm saying is that *when the system reaches human intelligence*, it
will probably be *in the middle of a hard takeoff* that only bottlenecks
on available hardware when considerably transhuman intelligence is reached
- enough intelligence to change the world as it stands, certainly enough
to absorb poorly-defended additional computing power if that proves
necessary and ethical, and probably enough to achieve nanotechnology in a
couple of weeks. By the time you're at the level of human general
smartness, you've already far transcended human equivalence in writing
code (because you grew up in a computer, and the humans didn't); you are
mostly self-encapsulating with respect to improvements being improvements
in the thoughts that created your components, so that each improvement in
general intelligence yields further improvements in components that sum to
further improvements in general intelligence; and you are carrying out at
least some classes of thought (the ones that are mostly serial rather than
parallel) at thousands of times human speed, such that requesting human
assistance is ofttimes not a good investment, and there are enough things
that can be done *without* human assistance to sustain the hard takeoff.
In which case, the Global Brain pre-Singularity vision is something that
could only happen with crude general intelligence or primitive Webminds,
not human-equivalent ones.

-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT