From: Eliezer Yudkowsky (sentience@pobox.com)
Date: Tue Jun 01 2004 - 14:31:07 MDT
Damien Broderick wrote:
> Interesting if somewhat baggy paper. Thanks for posting it!
> 
> At 01:26 PM 6/1/2004 -0400, Eliezer wrote:
> 
>> The point is that it's rewritable moral content if the moral content 
>> is not what we want, which I view as an important moral point; that it 
>> gives humanity a vote rather than just me, which is another important 
>> moral point to me personally; and so on.
> 
> My sense was that it gives the system's [mysteriously arrived at] 
> estimate of humanity's [mysteriously arrived at] optimal vote. This, as 
> Aubrey pointed out, is very different, and critical.
True.  I don't trust modern-day humanity any more than I trust modern-day 
me.  Less, actually, for they don't seem to realize when they need to be 
scared.  That includes transhumanists.  "Yay, my own source code!  [Modify] 
aaargh [dies]."
Incidentally, I suspect that optimal meta-rules call for satisficing votes, 
and even the meta-rules may end up as satisficing rather than optimal.  I'm 
trying to figure out how to attach a decent explanation of this to 
Collective Volition, because the problem would be hugely intractable if you 
had to extrapolate *the* decision of each individual, rather than get a 
spread of possible opinions that lets the dynamic estimate *a satisficing 
decision* for the superposed population.  Another good reason to forge 
one's own philosophies rather than paying any attention to what that silly 
collective volition does (if it ends up as our Nice Place to Live).
> By the way, two fragments of the paper stung my attention:
> 
> < The dynamics will be choices of mathematical viewpoint, computer 
> programs, optimization targets, reinforcement criteria, and AI training 
> games with teams of agents manipulating billiard balls. >
> 
> I'd like to see some.
No can do; I'm still in the middle of working out exactly what I want. 
Right now I don't even know how to do simple things, let alone something as 
complicated and dangerous as collective volition.  If I tried I'd go up in 
a puff of paperclips.  There are N necessary problems and I've solved M 
problems, M < N.
> < The technical side of Friendly AI is not discussed here. The technical 
> side of Friendly AI is hard and requires, like, actual math and stuff. 
> (Not as much math as I'd like, but yes, there is now math involved.)  >
> 
> I'd still like to see some.
Okay, here's one of my prides and joys, of which I've got no clue whether 
it's original.  Let U(x) be a utility function mapping over final states X. 
  Utility functions produce invariant outcomes when weighted by any 
constant multiplicative factor, so the total utility of an outcome, or the 
expected utility of an action, is not a good measure of the strength of an 
optimization process (although it might be a good measure of the strength 
of one optimization process relative to another).  Let V(u) be a measure of 
the volume of states in X with U(x) > u.  As u increases, V(u) decreases, 
reflecting the smaller volume of states with U(x) > u.  Take the logarithm 
of the volume v = V(u) measured as a fraction of the total configuration 
space V, log(v/V).  Take the negation of this quantity, -log(v/V), and call 
it the information.  We now have a relatively objective measurement of the 
power of an optimization process, which is the information it produces in 
outcomes.  The smaller the target it can hit, the more powerful the 
optimization process.
(Mitchell Porter helped point out a major and silly error in my original 
math for this.)
This is how you'd go about translating something like "self-determination" 
into "don't optimize individual lives [too heavily]".  If an AI is 
extrapolating your possible futures and *maximizing* its expected utility, 
or its estimate of your volition's utility, then you've got the entire 
phase space compressed down to a point; a huge amount of information that 
leaves no room for personal decision except as an expected event in the 
AI's game plan.
(This is why FAI theory no longer runs on expected utility maximization, 
aside from many good reasons to regard maximization as insanely dangerous 
during an AI's youth.  Actually, I realized while writing "Collective 
Volition" that I would need to junk expected utility entirely and develop a 
theory of abstractions from decision systems, but that's a separate story.)
A satisficing decision system would raise U(x) above some threshold level, 
but not *maximize* it, leaving the remainder of your life open to your own 
planning.  Even this might interfere with self-determination too much, 
which is why I suggested the notion about collective background rules.  You 
can give people arbitrarily fun superpowers, and so long as the people are 
still wielding the superpowers themselves, you aren't optimizing their 
individual lives (by extrapolating their futures and selecting particular 
outcomes from the set).  You're *improving* their lives, which is what we 
might call a transform that predictably increases utility by some rough 
amount.  The heuristics and biases folks show that the important thing for 
life satisfaction is to make sure that things keep improving.
Satisficing bindings, improving pressures, preferring biases... there's 
plenty of options open besides *maximizing* expected utility.  But for FAI 
to make sense, you have to stop thinking of "utility" as a mysterious 
valuable substance inherent in objects, and start thinking of optimization 
processes that steer the future, with a utility function acting as the 
steering control.
This is an example of what I mean by a choice of mathematical viewpoint: 
karma is measured in bits.  For example, someone with four bits of good 
karma would end up with their space of possible futures compressed to the 
best sixteenth of the pie, or more subtly end up with their spectrum of 
possible futures conditional on some action A compressed to the best 
sixteenth of actions A, in terms of expected utility of A - perhaps leaving 
all the original futures open, but with a better weighting on the 
probabilities.  Perhaps humanity will legislate the equivalent of the 
second law of thermodynamics for karma; you can't end up with four bits of 
karma without at least four bits of alerted human choice going into 
creating it.  I offer up this wild speculation as a way of showing that 
things like "self-determination" can be translated into terms explicable to 
even a very young AI.  Though as Norm Wilson points out, self-determination 
is not part of the initial dynamic in collective volition.
It's not a very intimidating example, I know.  Mostly, the math involved is 
still plain old-fashioned standard Bayesian decision theory, of which it is 
simple enough to learn the math as equations, but one must learn to see the 
math immanent in all things FAI.  It's not much math, but it's enough math 
to keep out anyone who hasn't received their Bayesian enlightenment; so 
there, it's a technical discipline, no mortals allowed, yay, I'm a big 
expert look at my business cards.  Seriously, most of the questions now 
being tossed around about the implementation mechanism behind collective 
volition, all the "But that's impossible" and "That's too hard even for a 
superintelligence" and "Wouldn't the AI do XYZ instead?" would go away if 
people received their Bayesian enlightenment.  That's why I emphasized that 
the Collective Volition paper is just about what I want to do, and take my 
word for it that it looks theoretically possible (except for that one part 
about preventing sentient simulations which I don't yet understand clearly).
Regarding optimization processes as compressing outcomes into narrow 
regions of configuration space gets rid of a lot of the dreck about 
"intentionality" and all the wonderful properties that people keep wanting 
to attribute to "intelligence".  Natural selection produces information in 
DNA by compressing the sequences into regions of DNA-space with high 
fitness.  Humans produce tools from narrow regions of tool-space, plans 
from narrow regions of plan-space.  Paperclip maximizers steer reality into 
regions containing the maximum possible number of paperclips.  Yes, you can 
specify a paperclip maximizer mathematically, in fact it's a lot easier 
than specifying anything nice, and yes, a paperclip maximizer is what you 
end up with if you don't solve all N problems in FAI.
I'm sort of sleepy (it nears the end of my current sleep cycle) and I don't 
know if any of that made sense.  Guess I just wanted to prove I wasn't 
bluffing before I went to bed.
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Tue Feb 21 2006 - 04:22:38 MST