Voss's comments on Guidelines

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Fri May 03 2002 - 16:38:43 MDT

Peter Voss pointed out at the Gathering that I hadn't responded to this yet,
which Voss published slightly less than a year ago. Sorry! Anyway, here's
my response.


> The Singularity Institute for Artificial Intelligence (SIAI) recently
> published v1.0 of its guidelines for designing ‘Friendliness’ into
> intelligent systems.
> AI, and especially self-improving systems - Seed AI – may well
> achieve a ‘critical mass’ of intelligence from which their ability could
> grow hyper-exponentially over a short period of time. Such sudden
> growth obviously poses some risks. In fact, even without a ‘hard
> take-off’, we should carefully consider the possibility of future, more
> autonomous AI systems acting in unanticipated goal-directed ways
> not amicable to our well-being.
> The question of our relationship with truly intelligent machines is a
> crucial one, and I applaud and support SIAI for its work in this area.
> Several issues come to mind in evaluating the Guidelines:
> Most fundamentally: Are the Guidelines necessary? Will machines
> ever have ‘a will of their own’, or will they always remain subservient to
> our purpose and goals? (Neither the Guidelines, nor these comments
> address the substantial problem of AIs specifically programmed or
> instructed to be malevolent).

"When a Roman father told his son that it was a sweet and seemly thing to
die for his country, he believed what he said. He was communicating to the
son an emotion which he himself shared and which he believed to be in accord
with the value which his judgement discerned in noble death. He was giving
the boy the best he had, giving of his spirit to humanize him as he had
given of his body to beget him. [snip] There are only two courses open to
Gaius and Titius [two anonymized book authors Lewis is criticizing]. Either
they must go the whole way and debunk this sentiment like any other, or must
set themselves to work to produce, from outside, a sentiment which they
believe to be of no value to the pupil and which may cost him his life,
because it is useful to us (the survivors) that our young men should feel
it. If they embark on this course the difference between the old and the
new education will be an important one. Where the old initiated, the new
merely 'conditions'. The old dealt with its pupils as grown birds deal with
young birds when they teach them to fly; the new deals with them more as the
poultry-keeper deals with young birds - making them thus or thus for
purposes of which the birds know nothing. In a word, the old was a kind of
propagation - men transmitting manhood to men; the new is merely
        -- C. S. Lewis, "The Abolition of Man"

> My present view on this is agnostic. It may well turn out that any
> super-intelligence will inherently be benevolent towards us. Or, that it
> remains neutral, with no goals other than those of its designer/
> operator. On the other hand, I do acknowledge the possibility of
> independent rogue AI. Obviously, we should err on the side of caution.

There is, of course, a third possibility: that in the course of transmitting
our own moral values and philosophical reasoning on to the AI, we will have
transmitted that which is both necessary and sufficient to arrive at a set
of values superior to our own. Friendly AI may be a necessary enabling
condition for this! In the absence of definite knowledge we should try to
plan ahead for all three cases, as well as any other cases we can prepare
for without messing up the first three.

> The next question then: Are SIAI’s Guidelines theoretically
> sound? I have a number of grave reservations on several assumptions
> and conclusions of the AI design underlying the guidelines, as well as
> the guidelines themselves. However, allowing for a convergence of
> ideas, I want to move to a more practical question:
> Can the Guidelines be implemented? Currently only a few dozen
> AI researchers/ teams (worldwide!) are actually focusing on theoretical
> or practical aspects of achieving general, human-level machines
> intelligence. Even fewer claim to have a reasonably comprehensive
> theoretical framework for achieving it.
> The practicality of implementing the Guidelines must be assessed in
> the context of specific design proposals for AI. It would be valuable to
> have feedback from all the various players: both on their overall view on
> the need for (and approach towards) ‘Friendliness’, and also whether
> implementing SIAI’s guidelines would be compatible with their own
> designs.
> The Guidelines’ eight design recommendations in the light of
> my theory of mind/ intelligence:
> 1) Friendliness-topped goal system – Not possible: My design does
> not allow for such a high-level ‘supergoal’.

What are the forces in your design that determine whether one action is
taken rather than another?

> 2) Cleanly causal goal system – Not possible: requires 1)

Does your system choose between actions on the basis of which future events
those actions are predicted to lead to?

> 3) Probabilistic supergoal content – Inherent in my design: All
> knowledge and goals are subject to revision.

If your system has no supergoal, but does have reinforcement, the
reinforcement systems are also part of the goal system. Are the
reinforcement systems subject to revision? In any case, the recommendation
of "probabilistic supergoal content" does not just mean that certain parts
of the goal system are subject to revision, but that they have certain
specific semantics that will enable the system to consider that revision as
desirable, so that the improvement of Friendliness is stable under
reflection, introspection, and self-modification.

> 4) Acquisition of Friendliness sources – While I certainly encourage
> the AI to acquire knowledge (including ethical theory) compatible with
> what I consider moral, this does not necessarily agree with what
> others regard as desirable ethics/ Friendliness.

"Acquisition of Friendliness sources" here means acquiring the forces that
influence human moral decisions as well as learning the final output of
those decisions. It furthermore has the specific connotation of attempting
to deduce the forces that influence the moral statements of the programmers
even if the programmers themselves do not know them.

> 5) Causal validity semantics – Inherent in my design: One of the key
> functions of the AI is to (help me) review and improve its premises,
> inferences, conclusions, etc. at all levels. Unfortunately, this ability
> only becomes really effective once a significant level of intelligence
> has already been reached.

I agree with the latter sentence. However, revision of beliefs is what I
would consider ordinary reasoning - causal validity semantics means that the
AI understands that its basic structure, its source code, is also the
product of programmer intentions that can be wrong. That's *why* this
ability only becomes effective at a significant level of intelligence; it
inherently requires an integrated introspective understanding of brainware
and mindware, at minimum on the level of a human pondering evolutionary

> 6) Injunctions – This seems like a good recommendation, however it is
> not clear what specific injunctions should be implemented, how to
> implement them effectively, and to what extent they will oppose other
> recommendations/ features.

Hopefully, SIAI will learn how injunctions work in practice, then publish
the knowledge.

> 7) Self-modeling of fallibility - Inherent in my design. This seems to be
> an abstract expression of point 3)

The human understanding of fallibility requires points (3), (4), and (5); an
AI, to fully understand its own fallibility, requires all of these as well.
*Beginning* to model your own fallibility takes much less structure. Any AI
with a probabilistic goal system can do so, though doing so efficiently
requires reflection.

> 8) Controlled ascent – Good idea, but may be difficult to implement: It
> may be hard to distinguish between rapid knowledge acquisition,
> improvements in learning, and overall self-improvement (ie. substantial
> increases in intelligence).

All you need for a controlled ascent feature to be worthwhile is the
prospect of catching some of the hard takeoffs some of the time.

> Peter Voss, June 2001

Sorry about the delay...

-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT