From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed May 02 2001 - 20:30:31 MDT
Ben Goertzel wrote:
>
> I don't have time to carry out this argument in 100% adequate detail right
> now, because I'm going to Norway tomorrow to beg for $$ from some VC's there
> ;> But hopefully over the weekend I'll find time to write that essay on
> the logic of Friendliness that I keep wanting to write, which will explain
> what I mean by "invariants" and so forth...
Yes, we should probably work this out before you launch the self-improving
superintelligent version of Webmind. So we probably have another couple
of months before the problem becomes really urgent, right? (JOKE! Even
Ben says it's another 3 years, minimum.)
> Anyway, I'll make a few comments...
>
> > Point the 1st: Friendliness is not, and cannot, be implemented on the
> > level of source code. Friendliness is cognitive content.
>
> Sure, but source code can bias the system in favor of certain cognitive
> content
Depends on how philosophically sophisticated the system is. For an
advanced, reflective system that can think about thinking, and
specifically think about goal thinking and examine source code, the system
will be aware that the source code is biasing it and that the bias was
caused by humans. If the AI regards sources of behaviors as more and less
valid, it may come to regard some specific bias as invalid. (FAI
explicitly proposes giving the AI the capability to understand causation
and validity in this way.) Source code or precreated content can support
the system, or even bias it, but only as long as the AI-as-a-whole concurs
that the support or bias is a good thing (albeit under the current
system).
> > Point the 2nd: Friendliness is not a "portion" which "requires" an AI to
> > be friendly to humans. Friendliness is not an add-on or a plug-in.
> > Friendliness is the whole of the goal system. It is what the AI wants to
> > do.
>
> I continue not to believe that Friendliness can viably be made "the whole of
> the goal system." I'll clarify this point in my systematic write-up when I
> get to it. Logically, sure you CAN view any other worthy goal as a subgoal
> of Friendliness, but I continue to believe this is a sufficiently awkward
> way to manage other goals, that it's not a workable way for a mind to
> function.
Yes, this is a point of continuing substantive disagreement. (Though
there's also a philosophical disagreement about whether you have the
responsibility to do it *anyway* if it turns out to be hard but not
impossible; still, I currently think it shouldn't *be* hard.)
> > Point the 3rd: Friendliness is not "invariant" - a strange term to use
> > for a system one of whose first and foremost recommendations is that
> > supergoals should be probabilistic!
>
> What I meant is, as the system rewrites its own code, the fact of its
> Friendliness is supposed to remain unchanged. The specific content
> underlying this Friendliness may of course change. Mathematically, one
> might say that the class of Friendly mind-states is supposed to be an
> probabilistically almost-invariant subspace of the class of all mind-states.
Actually, it just has to be invariant *enough*, by which I mean, for
example, "more invariant than a human upload", or "not changing so much as
to render desirable the act of breaking up humans for their component
atoms". (See "requirements for sufficient convergence" in FAI.)
> > Point the 4th: Friendliness is not "hardwired", a term which I've seen
> > you use several times.
>
> What I mean by "hard-wiring Friendliness" is placing Friendliness at the top
> of the initial goal system and making the system express all other goals as
> subgoals of this. Is this not what you propose? I thought that's what you
> described to me in New York...
Yes, that's what I described, but by that description *I'm* hard-wired
Friendly, since this is one of the properties I strive for in my own
declarative philosophical content.
"Hard wiring", to me, means that the system contains features intended to
hold the content in place even if the AI tries to modify it, or that the
feature is implemented as low-level code. The human taste for sugar and
fat is "hardwired", for example. It's implemented as low-level code and
very hard to override.
> > The main part of the model where I disagree with you is that it'll take a
> > lot more than a Java supercompiler description to give a general
> > intelligence humanlike understanding of source code. The Java
> > supercompiler description is only the very first step.
>
> I agree there. But I tend to think that if you put that first step together
> with WM's higher-order inference engine, the second step will come all by
> itself.
Depends on how good WM is. If WM is already very intelligent in
Eurisko-like heuristic discovery and composition, and if it has enough
computing power to handle the clustering and schema creation, feeding in
the low-level description might be enough for WM to create an effective
perceptual understanding of the higher-level features by examining typical
human-written code. If WM has a strong understanding of purpose and a
strong pre-existing understanding of vis modules' functionality (WM gets a
"ve", by this point), then you could, conceivably, just feed in the Java
supercompiler description and watch the thing blaze straight through a
hard takeoff. Low-probability outcome, but very real.
> > What I'm saying is that *when the system reaches human intelligence*, it
> > will probably be *in the middle of a hard takeoff*
>
> And this is another point on which our intuitions differ. I think that
> human-level intelligence will probably be achieved significantly **before**
> a hard takeoff. I think that optimizing your own mind processes requires
> human-level intelligence or maybe a little more.
I don't necessarily predict that Webmind 2.0 will be sufficient unto a
hard takeoff. It's just that, being "conservative" as a Friendliness
programmer must be, I feel obliged to take your word for it when it comes
to preparing. I wish you'd be a bit more "conservative" when it comes to
preparing for takeoff, even if you predict a slow one.
> We don't really disagree very profoundly; most of our disagreements are just
> different intuitions about timings of things that none of us really has data
> about. The most significant difference I see is as to whether, initially,
> one wants to rig a goal system with Friendliness at the top....
The Friendliness-topped goal system, the causal goal system, the
probabilistic supergoals, and the controlled ascent feature are the main
things I'd want Webmind to add before the 1.0 version of the AI Engine.
-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT