RE: Voss's comments on Guidelines

From: Peter Voss (peter@optimal.org)
Date: Fri May 03 2002 - 22:13:17 MDT


Hi Eli,

>> 1) Friendliness-topped goal system – Not possible: My design does not
allow for such a high-level ‘supergoal’.

>What are the forces in your design that determine whether one action is
taken rather than another?

A myriad different goal systems that have complex inter-dependencies. More
specifically, some of the 'forces' are: the structure of sensed data,
feature extraction, internal resource management, operator input (teaching,
coaxing, etc), random (exploration), etc. Most of the particular system
involved are *highly adaptive* and interact in unpredictable feedforward &
feedback loops and networks.

>> 2) Cleanly causal goal system – Not possible: requires 1)

>Does your system choose between actions on the basis of which future events
those actions are predicted to lead to?

Only at the higher levels, and within severe constraints (as are our
choices).

>> 3) Probabilistic supergoal content – Inherent in my design: All >
knowledge and goals are subject to revision.

>If your system has no supergoal, but does have reinforcement, the
reinforcement systems are also part of the goal system. Are the
reinforcement systems subject to revision?

Yes. Yes.

>In any case, the recommendation of "probabilistic supergoal content" does
not just mean that certain parts of the goal system are subject to revision,
but that they have certain specific semantics that will enable the system to
consider that revision as desirable, so that the improvement of Friendliness
is stable under reflection, introspection, and self-modification.

Can't do that - there is no supergoal.

>> 4) Acquisition of Friendliness sources – While I certainly encourage the
AI to acquire knowledge (including ethical theory) compatible with what I
consider moral, this does not necessarily agree with what others regard as
desirable ethics/ Friendliness.

>"Acquisition of Friendliness sources" here means acquiring the forces that
influence human moral decisions as well as learning the final output of
those decisions. It furthermore has the specific connotation of attempting
to deduce the forces that influence the moral statements of the programmers
even if the programmers themselves do not know them.

I agree, but does not address my point of conflicting views of desirable
ethics. I would certainly hope that my AI will be super moral by my
standards, or by standards I would have if I were smarter - but even that
doesn't address the issue of meta-ethics - ones highest value/ purpose (Huge
discussion about ethics looming!)

>> 5) Causal validity semantics – Inherent in my design: One of the key
functions of the AI is to (help me) review and improve its premises,
inferences, conclusions, etc. at all levels. Unfortunately, this ability
only becomes really effective once a significant level of intelligence has
already been reached.

>I agree with the latter sentence. However, revision of beliefs is what I
would consider ordinary reasoning - causal validity semantics means that the
AI understands that its basic structure, its source code, is also the
product of programmer intentions that can be wrong. That's *why* this
ability only becomes effective at a significant level of intelligence; it
inherently requires an integrated introspective understanding of brainware
and mindware, at minimum on the level of a human pondering evolutionary
psychology.

We agree. What this means is that this design criterion is actually to build
a system that is simply very intelligent so that it can better understand
humans and itself.

>> 6) Injunctions – This seems like a good recommendation, however it is not
clear what specific injunctions should be implemented, how to implement them
effectively, and to what extent they will oppose other recommendations/
features.

>Hopefully, SIAI will learn how injunctions work in practice, then publish
the knowledge.

So nothing we can implement now.

>> 7) Self-modeling of fallibility - Inherent in my design. This seems to be
an abstract expression of point 3)

>The human understanding of fallibility requires points (3), (4), and (5);
an AI, to fully understand its own fallibility, requires all of these as
well. *Beginning* to model your own fallibility takes much less structure.
Any AI with a probabilistic goal system can do so, though doing so
efficiently requires reflection.

Agree.

>> 8) Controlled ascent – Good idea, but may be difficult to implement: It
may be hard to distinguish between rapid knowledge acquisition, improvements
in learning, and overall self-improvement (ie. substantial increases in
intelligence).

>All you need for a controlled ascent feature to be worthwhile is the
prospect of catching some of the hard takeoffs some of the time.

Even by just 'being careful' we can catch 'some of the hard takeoffs some of
the time'. I thought that you were looking for something more reliable.
Obviously we want to do the best we can, but there are direct tradeoffs
between safety, and chances of rapid success.

Peter



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT