Re: Good old-fashioned CFAI

From: Eliezer Yudkowsky (
Date: Tue Sep 28 2004 - 07:16:57 MDT

Christian Rovner wrote:
> Eliezer Yudkowsky wrote:
>> If it's not well-specified, you can't do it.
> I understand that an AI steers its environment towards an optimal state
> according to some utility function. Apparently you say that this
> function must be well-specified if we want the AI to be predictably
> Friendly.
> This contradicts what I understood from CFAI, where you proposed the
> creation of an AI that would improve its own utility function according
> to the feedback provided by its programmers. This feedback implies a
> meta-utility function which is unknown even to them, and it would be
> gradually made explict thanks to the AI's superior intelligence. This
> sounded like a good plan to me. What exactly is wrong about it?

You would still need a well-specified description of what you were trying
to do. To give a flavor-only example (i.e., don't try this at home, it
won't work even if you do everything right), suppose that the programmers
are expected utility maximizers and Bayesians, and that the utility
function U-h(x) of a human can be factored into the additive sum of
subfunctions V1-h(x), V2-h(x)... V10-h(x). We will suppose that U-h(x) is
identical for all humans including the programmers, obviating many of the
complex questions that enter into collective volition. Suppose also that
humans have no explicit knowledge of their own utility functions, and can
only infer them by observing their own choices, or imagining choices, and
trying to build a model of their own utility function. They do, however,
know the abstract fact that they are expected utility maximizers and that
they possess some utility function U-h(x) that is the same for all humans
and that is factorizable into a set of utility functions V-h(x), etc.

And the humans are also capable (this is an extra specification, because a
standard expected utility maximizer does not include this ability) of
*abstracting* over uncertain states with uncertain utilities, and taking
this into account in their planning. For example, one might have an action
A that leads to a future F about which we know nothing except that it has a
utility of 32 - it might be any possible future such that it has a utility
of 32. As an example, human Fred must choose whether to place human Larry
or alien Motie Jerry at the controls of a car. By earlier assumption, all
humans share the same utility function, and this is known to humans; Moties
have a different utility function which is similar but not identical to the
human utility function. Even without visualizing all the possibilities
that Larry or Jerry might encounter, even without visualizing *any*
specific possibility, Fred will prefer to place Larry rather than Jerry in
the driver's seat. This sounds straightforward enough, but it requires
something above and beyond standard expected utility theory, some of which
I'm still working out how to handle. Standard expected utility theory just
operates over completely specified states of the universe; it doesn't
include any way of handling abstraction.

Now, here's the thing. Given all that, and given that Fred knows all that,
Fred might try to construct a Friendly AI that is a good thing from the
perspective of Fred's and humanity's utility function, which is not known
to Fred. Fred might try to devise an optimization process such that
feedback from Fred fine-tunes the effective utility function of the
optimization process, or try to devise an optimization process such that it
will scan in Fred or some other human and read out the effective utility
function from the physical state. Those are the two strategies I can think
of offhand, or Fred might try to combine them. Neither strategy is simple
and both contain all sorts of hidden gotchas.

*But* - and this is the key point - Fred has got to *know* all this stuff.
  He has got to know the nature of the problem in order to solve it. Fred
may be able to build an FAI without a complete printout of U-h(x) in hand.
  Fred can't possibly build an FAI without knowing that he is an "expected
utility maximizer" or that there *is* a U-h(x) behind his choices.

This is what I mean by saying that you cannot build an FAI to accomplish an
end for which you do not have a well-specified abstract description.

On this planet, humans *aren't* expected utility maximizers, which makes
things a bit more difficult for Eliezer than for Fred. But I want to
figure out how to solve Fred's simpler problem first, which would still be
a huge step forward.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:49 MDT