Re: ethics

From: Samantha Atkins (
Date: Sat May 22 2004 - 02:29:39 MDT

On May 21, 2004, at 3:18 PM, Eliezer Yudkowsky wrote:

> If you're familiar with the expected utility formalism and the notion
> of utility functions, then consider a utility function U(x), and an
> immense amount of computing power devoted to steering the universe
> into states with a 99.99% or better expectation that U(x) > T. (Note
> that this is a satisficer, not an expected utility maximizer.) The
> idea is that even if there's a huge amount of computing power devoted
> to looking for actions/plans/designs that achieve U(x) > T, such that
> the specific solutions chosen may be beyond human intelligence, the
> *ends* to which the solutions operate are humanly comprehensible. We
> can say of the system that it steers the futures into outcomes that
> satisfice U(x), even if we can't say how.

This seems to have the classic 3 wishes from a genie problem. We may
not be intelligent enough to formulate the "wish" clearly enough that
it does not generate quite unintended evil consequences when "granted"
by a sufficiently powerful process/being.

> Actually you need a great deal more complex goal structure than this,
> to achieve a satisfactory outcome. In the extrapolated volition
> version of Friendly AI that I'm presently working with, U(x) is
> constructed in a complex way from existing humans, and may change if
> the humans themselves change. Even the definition of how volition is
> extrapolated may change, if that's what we want.

Who/what is dong the construction of U(x)?

> (I'm starting to get nervous about my ability to define an
> extrapolation powerful enough to incorporate the reason why we might
> want to rule out the creation of sentient beings within the
> simulation, without simulating sentient beings. However, I've been
> nervous about problems that looked more impossible than this, and
> solved them. So I'm not giving up until I try.)

Is a sentient being within a simulation any less sentient than any
other? Perhaps you are assuming or which to assume boundaries where
none really exist.
> The problem word is "constrain". I would say rather that I choose an
> FAI into existence, and that what the FAI does is choose. The U(x)
> constrains the future, not the FAI; the FAI, in a strong sense, is
> *defined* by the choice of U(x). That becomes the what-it-does, the
> nature of the FAI; it is no more a constraint than physics is
> constrained to be physics, no more to be constrasted to some separate
> will than I want to break out of being Eliezer and become a teapot.

It is a much of a constraint as choosing the fundamental physical
constants of a universe in other words. :-)

> I would construct a fully reflective optimization process capable of
> indefinitely self-enhancing its capability to roughly satisfice our
> collective volition, to the exactly optimal degree of roughness we
> would prefer. Balancing between the urgency of our needs; and our
> will to learn self-reliance, make our own destinies, choose our work
> and do it ourselves.

The phrase "our collective volition" is worrisome. It is actually from
above your extrapolation from actual human beings of what "our
collective volition" is, correct?

>> What surprises me most here is the apparently widespread presence of
>> this concern in the community subscribed to this list -- the reasons
>> for my difficulty in seeing how FAI can even in principle be created
>> have been rehearsed by others and I have nothing to add at this
>> point. It seems that I am one of many who feel that this should be
>> SIAI FAQ number 1. Have you addressed it in detail online anywhere?
> Not really. I think that, given the difficulty of these problems, I
> cannot simultaneously solve them and explain them. Though I'm willing
> to take occasional potshots.

I greatly sympathize. However, explanations beyond potshots clarify
thinking and bring in other viewpoints that may be essential. In
particular the goals of the system should be explained well and with
clarity to the maximal extent possible before solving the technical
problems of their implementation or at least a theory of their
implementation. A lot of the questions are about goal-level


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT