Re: FAI: Collective Volition

From: Eliezer Yudkowsky (sentience@pobox.com)
Date: Tue Jun 01 2004 - 14:31:07 MDT

Next message: Damien Broderick: "Re: FAI: Collective Volition"
Previous message: Eliezer Yudkowsky: "Re: Summary of current FAI thought"
In reply to: Damien Broderick: "Re: FAI: Collective Volition"
Next in thread: Damien Broderick: "Re: FAI: Collective Volition"
Reply: Damien Broderick: "Re: FAI: Collective Volition"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Damien Broderick wrote:

> Interesting if somewhat baggy paper. Thanks for posting it!
>
> At 01:26 PM 6/1/2004 -0400, Eliezer wrote:
>
>> The point is that it's rewritable moral content if the moral content
>> is not what we want, which I view as an important moral point; that it
>> gives humanity a vote rather than just me, which is another important
>> moral point to me personally; and so on.
>
> My sense was that it gives the system's [mysteriously arrived at]
> estimate of humanity's [mysteriously arrived at] optimal vote. This, as
> Aubrey pointed out, is very different, and critical.

True. I don't trust modern-day humanity any more than I trust modern-day
me. Less, actually, for they don't seem to realize when they need to be
scared. That includes transhumanists. "Yay, my own source code! [Modify]
aaargh [dies]."

Incidentally, I suspect that optimal meta-rules call for satisficing votes,
and even the meta-rules may end up as satisficing rather than optimal. I'm
trying to figure out how to attach a decent explanation of this to
Collective Volition, because the problem would be hugely intractable if you
had to extrapolate *the* decision of each individual, rather than get a
spread of possible opinions that lets the dynamic estimate *a satisficing
decision* for the superposed population. Another good reason to forge
one's own philosophies rather than paying any attention to what that silly
collective volition does (if it ends up as our Nice Place to Live).

> By the way, two fragments of the paper stung my attention:
>
> < The dynamics will be choices of mathematical viewpoint, computer
> programs, optimization targets, reinforcement criteria, and AI training
> games with teams of agents manipulating billiard balls. >
>
> I'd like to see some.

No can do; I'm still in the middle of working out exactly what I want.
Right now I don't even know how to do simple things, let alone something as
complicated and dangerous as collective volition. If I tried I'd go up in
a puff of paperclips. There are N necessary problems and I've solved M
problems, M < N.

> < The technical side of Friendly AI is not discussed here. The technical
> side of Friendly AI is hard and requires, like, actual math and stuff.
> (Not as much math as I'd like, but yes, there is now math involved.) >
>
> I'd still like to see some.

Okay, here's one of my prides and joys, of which I've got no clue whether
it's original. Let U(x) be a utility function mapping over final states X.
Utility functions produce invariant outcomes when weighted by any
constant multiplicative factor, so the total utility of an outcome, or the
expected utility of an action, is not a good measure of the strength of an
optimization process (although it might be a good measure of the strength
of one optimization process relative to another). Let V(u) be a measure of
the volume of states in X with U(x) > u. As u increases, V(u) decreases,
reflecting the smaller volume of states with U(x) > u. Take the logarithm
of the volume v = V(u) measured as a fraction of the total configuration
space V, log(v/V). Take the negation of this quantity, -log(v/V), and call
it the information. We now have a relatively objective measurement of the
power of an optimization process, which is the information it produces in
outcomes. The smaller the target it can hit, the more powerful the
optimization process.

(Mitchell Porter helped point out a major and silly error in my original
math for this.)

This is how you'd go about translating something like "self-determination"
into "don't optimize individual lives [too heavily]". If an AI is
extrapolating your possible futures and *maximizing* its expected utility,
or its estimate of your volition's utility, then you've got the entire
phase space compressed down to a point; a huge amount of information that
leaves no room for personal decision except as an expected event in the
AI's game plan.

(This is why FAI theory no longer runs on expected utility maximization,
aside from many good reasons to regard maximization as insanely dangerous
during an AI's youth. Actually, I realized while writing "Collective
Volition" that I would need to junk expected utility entirely and develop a
theory of abstractions from decision systems, but that's a separate story.)

A satisficing decision system would raise U(x) above some threshold level,
but not *maximize* it, leaving the remainder of your life open to your own
planning. Even this might interfere with self-determination too much,
which is why I suggested the notion about collective background rules. You
can give people arbitrarily fun superpowers, and so long as the people are
still wielding the superpowers themselves, you aren't optimizing their
individual lives (by extrapolating their futures and selecting particular
outcomes from the set). You're *improving* their lives, which is what we
might call a transform that predictably increases utility by some rough
amount. The heuristics and biases folks show that the important thing for
life satisfaction is to make sure that things keep improving.

Satisficing bindings, improving pressures, preferring biases... there's
plenty of options open besides *maximizing* expected utility. But for FAI
to make sense, you have to stop thinking of "utility" as a mysterious
valuable substance inherent in objects, and start thinking of optimization
processes that steer the future, with a utility function acting as the
steering control.

This is an example of what I mean by a choice of mathematical viewpoint:
karma is measured in bits. For example, someone with four bits of good
karma would end up with their space of possible futures compressed to the
best sixteenth of the pie, or more subtly end up with their spectrum of
possible futures conditional on some action A compressed to the best
sixteenth of actions A, in terms of expected utility of A - perhaps leaving
all the original futures open, but with a better weighting on the
probabilities. Perhaps humanity will legislate the equivalent of the
second law of thermodynamics for karma; you can't end up with four bits of
karma without at least four bits of alerted human choice going into
creating it. I offer up this wild speculation as a way of showing that
things like "self-determination" can be translated into terms explicable to
even a very young AI. Though as Norm Wilson points out, self-determination
is not part of the initial dynamic in collective volition.

It's not a very intimidating example, I know. Mostly, the math involved is
still plain old-fashioned standard Bayesian decision theory, of which it is
simple enough to learn the math as equations, but one must learn to see the
math immanent in all things FAI. It's not much math, but it's enough math
to keep out anyone who hasn't received their Bayesian enlightenment; so
there, it's a technical discipline, no mortals allowed, yay, I'm a big
expert look at my business cards. Seriously, most of the questions now
being tossed around about the implementation mechanism behind collective
volition, all the "But that's impossible" and "That's too hard even for a
superintelligence" and "Wouldn't the AI do XYZ instead?" would go away if
people received their Bayesian enlightenment. That's why I emphasized that
the Collective Volition paper is just about what I want to do, and take my
word for it that it looks theoretically possible (except for that one part
about preventing sentient simulations which I don't yet understand clearly).

Regarding optimization processes as compressing outcomes into narrow
regions of configuration space gets rid of a lot of the dreck about
"intentionality" and all the wonderful properties that people keep wanting
to attribute to "intelligence". Natural selection produces information in
DNA by compressing the sequences into regions of DNA-space with high
fitness. Humans produce tools from narrow regions of tool-space, plans
from narrow regions of plan-space. Paperclip maximizers steer reality into
regions containing the maximum possible number of paperclips. Yes, you can
specify a paperclip maximizer mathematically, in fact it's a lot easier
than specifying anything nice, and yes, a paperclip maximizer is what you
end up with if you don't solve all N problems in FAI.

I'm sort of sleepy (it nears the end of my current sleep cycle) and I don't
know if any of that made sense. Guess I just wanted to prove I wasn't
bluffing before I went to bed.

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence

Next message: Damien Broderick: "Re: FAI: Collective Volition"
Previous message: Eliezer Yudkowsky: "Re: Summary of current FAI thought"
In reply to: Damien Broderick: "Re: FAI: Collective Volition"
Next in thread: Damien Broderick: "Re: FAI: Collective Volition"
Reply: Damien Broderick: "Re: FAI: Collective Volition"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Tue Feb 21 2006 - 04:22:38 MST