From: Eliezer Yudkowsky (sentience@pobox.com)
Date: Tue Sep 28 2004 - 07:16:57 MDT
Christian Rovner wrote:
> Eliezer Yudkowsky wrote:
>>
>> If it's not well-specified, you can't do it.  
> 
> I understand that an AI steers its environment towards an optimal state 
> according to some utility function. Apparently you say that this 
> function must be well-specified if we want the AI to be predictably 
> Friendly.
> 
> This contradicts what I understood from CFAI, where you proposed the 
> creation of an AI that would improve its own utility function according 
> to the feedback provided by its programmers. This feedback implies a 
> meta-utility function which is unknown even to them, and it would be 
> gradually made explict thanks to the AI's superior intelligence. This 
> sounded like a good plan to me. What exactly is wrong about it?
You would still need a well-specified description of what you were trying 
to do.  To give a flavor-only example (i.e., don't try this at home, it 
won't work even if you do everything right), suppose that the programmers 
are expected utility maximizers and Bayesians, and that the utility 
function U-h(x) of a human can be factored into the additive sum of 
subfunctions V1-h(x), V2-h(x)... V10-h(x).  We will suppose that U-h(x) is 
identical for all humans including the programmers, obviating many of the 
complex questions that enter into collective volition.  Suppose also that 
humans have no explicit knowledge of their own utility functions, and can 
only infer them by observing their own choices, or imagining choices, and 
trying to build a model of their own utility function.  They do, however, 
know the abstract fact that they are expected utility maximizers and that 
they possess some utility function U-h(x) that is the same for all humans 
and that is factorizable into a set of utility functions V-h(x), etc.
And the humans are also capable (this is an extra specification, because a 
standard expected utility maximizer does not include this ability) of 
*abstracting* over uncertain states with uncertain utilities, and taking 
this into account in their planning.  For example, one might have an action 
A that leads to a future F about which we know nothing except that it has a 
utility of 32 - it might be any possible future such that it has a utility 
of 32.  As an example, human Fred must choose whether to place human Larry 
or alien Motie Jerry at the controls of a car.  By earlier assumption, all 
humans share the same utility function, and this is known to humans; Moties 
have a different utility function which is similar but not identical to the 
human utility function.  Even without visualizing all the possibilities 
that Larry or Jerry might encounter, even without visualizing *any* 
specific possibility, Fred will prefer to place Larry rather than Jerry in 
the driver's seat.  This sounds straightforward enough, but it requires 
something above and beyond standard expected utility theory, some of which 
I'm still working out how to handle.  Standard expected utility theory just 
operates over completely specified states of the universe; it doesn't 
include any way of handling abstraction.
Now, here's the thing.  Given all that, and given that Fred knows all that, 
Fred might try to construct a Friendly AI that is a good thing from the 
perspective of Fred's and humanity's utility function, which is not known 
to Fred.  Fred might try to devise an optimization process such that 
feedback from Fred fine-tunes the effective utility function of the 
optimization process, or try to devise an optimization process such that it 
will scan in Fred or some other human and read out the effective utility 
function from the physical state.  Those are the two strategies I can think 
of offhand, or Fred might try to combine them.  Neither strategy is simple 
and both contain all sorts of hidden gotchas.
*But* - and this is the key point - Fred has got to *know* all this stuff. 
  He has got to know the nature of the problem in order to solve it.  Fred 
may be able to build an FAI without a complete printout of U-h(x) in hand. 
  Fred can't possibly build an FAI without knowing that he is an "expected 
utility maximizer" or that there *is* a U-h(x) behind his choices.
This is what I mean by saying that you cannot build an FAI to accomplish an 
end for which you do not have a well-specified abstract description.
On this planet, humans *aren't* expected utility maximizers, which makes 
things a bit more difficult for Eliezer than for Fred.  But I want to 
figure out how to solve Fred's simpler problem first, which would still be 
a huge step forward.
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:49 MDT