Re: Collective Volition: Wanting vs Doing.

From: Eliezer Yudkowsky (
Date: Sun Jun 13 2004 - 08:56:48 MDT

Samantha Atkins wrote:
> On Jun 12, 2004, at 6:57 PM, Eliezer Yudkowsky wrote:
>> This question does appear to keep popping up. Roughly, a collective
>> volition is what I get when:
>> (a) I step back and ask the meta-question of how I decided an earlier
>> Eliezer's view of "Friendliness" was "mistaken".
>> (b) I apply the same meta-question to everyone else on the planet.
>> Whatever it is that you use, mentally, to consider any alternative to
>> collective volition, anything that would be of itself friendlier -
>> that's you, a human, making the decision; so now imagine that we take
>> you and extrapolate you re-making that decision at a higher level of
>> intelligence, knew more, thought faster, more the person etc.
> Yes, I get that and it is enticing. But precisely how will the FRPOP
> gets its bearings as to what is the "direction" of "more the person"?
> Some of the others are a bit problematic too. But this one seems the
> best and central trick. More the person I would like to be? I, with
> all my warts? Wouldn't I have a perhaps badly warped view of what kind
> of person I would like to be? Would the person I would like to be make
> indeed better choices? How will the AI know of this person or model
> this person?

Samantha, you write that you might have a badly warped view of what kind of
person you would like to be. "Badly warped" by what criterion that I feed
to the FAI? Your criterion? Someone else's? Where am I supposed to get
this information, if not, somehow, from you? When you write down exactly
how the information is supposed to get from point A (you) to point B (the
FAI), and what the FAI does with the information once it's there, you'll
have something that looks like - surprise! - a volition-extrapolating
dynamic. It's not a coincidence. That's where the idea of a
volition-extrapolating dynamic *originally comes from*.

>> The benefit of CV is that (a) we aren't stuck with your decision about
>> Friendliness forever (b) you don't have to make the decision using
>> human-level intelligence.
> Well, we don't make the decision at all it seems to me. The AI does
> based on its extrapolation of our idealized selves.

Not precisely, but it's closer than saying that we would make the decision
using (EEK!) human-level intelligence.

> I am not sure
> exactly what our inputs would be. What do you have in mind?

Our inputs would be our current selves, from which the decision of a future
humankind might be predicted.

>> It's easy to see that all those other darned humans can't be trusted,
>> but what if we can't trust ourselves either? If you can employ an
>> extrapolation powerful enough to leap out of your own fundamental
>> errors, you should be able to employ it on all those other darned
>> humans too.
> Well yes, but it is an "if" isn't it?

Yep. Big if.

> It is actually a fairly old
> spiritual exercise to invoke one's most idealized self and listen to its
> advise or let it decide. It takes many forms, some of which put the
> idealized self as other than one's self, but the gist is not that
> different.

In the spiritual exercise your idealized self will never be any smarter
than you are, never know anything you don't. It will say things that sound
wise to your current self - things a village elder might say - things you
might severely disagree with if you did know more, think smarter. Can you
ask your idealized spiritual self to build a nanobot? I don't trust the
notion of *spiritual* extrapolation for grounding; I think that's the wrong
direction. The word "spirituality" makes people people go warm and fuzzy,
and yes we need the warm fuzzies, but I think that if we took people and
filtered out everything that a member of the Fluffy Bunny Coven would call
"unspiritual", we'd end up with unhumans.


It's more an order-of-evaluation question than anything else. I currently
guess that one needs to evaluate some "knew more" and "thought faster"
before evaluating "more the people we wished we were". Mostly because
"knew more" and "thought faster" starting from a modern-day human who makes
fluffy bunny errors doesn't have quite the same opportunity to go
open-endedly recursively wrong as "more the people we wished we were"
evaluated on a FBer.

One obvious rule for order-of-evaluation would be to define a metric of
distance (difficulty of explanation to current self) and carry out
shorter-distance extrapolations before longer-distance extrapolations.

>> Maybe a better metaphor for collective volition would be that it
>> refers questions to an extrapolated adult humankind, or to a
>> superposition of the adult humanities we might become.
> So the AI becomes an adjunct and amplifier of a specialized form of
> introspective spiritual exercise? Wild! AI augmented self-improvement
> of humankind.

Right. AI augmented self-improvement of humankind with the explicit
notation that the chicken-and-egg part of this problem is that modern-day
humans aren't smart enough to self-improve without stomping all over their
own minds with unintended consequences, aren't even smart enough to
evaluate the question "What kind of person do you want to be?" over its
real experiential consequences rather than a small subset of human verbal
descriptions of humanly expected consequences. So rather than creating a
*separate* self-improving humane thing, one does something philosophically
more complex and profound (but perhaps not more difficult from the
standpoint of FAI theory, although it *sounds* a lot harder). One binds a
transparent optimization process to predict what the grownup selves of
modern-day humans would say if modern humans grew up together with the
ability to self-improve knowing the consequences. The decision function of
the extrapolated adult humanity includes the ability of the collective
volition to restrain its own power or rewrite the optimization function to
something else; the collective volition extrapolates its awareness that it
is just an extrapolation and not our actual decisions.

In other words, one handles *only* and *exactly* the chicken-and-egg part
of the problem - that modern-day humans aren't smart enough to self-improve
to an adult humanity, and that modern-day society isn't smart enough to
render emergency first aid to itself - by writing an AI that extrapolates
over *exactly those* gaps to arrive at a picture of future humankind if
those problems were solved. Then the extrapolated superposed possible
future humankinds, the collective volition, hopefully decides to act in our
time to boost us over the chicken-and-egg recursion; doing enough to solve
the hard part of the problem, but not annoying us or taking over our lives,
since that's not what we want (I think; at least it's not what I want). Or
maybe the collective volition does something else. I may have phrased the
problem wrong. But for I as an FAI programmer to employ some other
solution, such as creating a new species of humane intelligence, would be
inelegant; it doesn't solve exactly and only the difficult part of the problem.

I may end up needing to be inelegant, but first I want to try really hard
to find a way to do the Right Thing.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT