Re: FAI: Collective Volition

From: Eliezer Yudkowsky (
Date: Tue Jun 01 2004 - 06:01:06 MDT

Samantha Atkins wrote:
> Thanks for writing this Eliezer.

Thanks for commenting!

> My first level comments follow.
> How can we possibly get an SAI, excuse me, a "Friendly Really Powerful
> Optimization Process". to successfully extrapolate the full collective
> volition of humanity? At this point in the game we can't even master
> simple DWIM applications. We do not have AIs that are capable of
> understanding immediate volition much less the full extended volition.

Huge progress in AI is required to do this? Yes, no kidding. Leave that
part to SIAI. For the moment we speak only of the ends to which the means
are directed.

> So how can any device/AI/optimization process claiming to do so possibly
> seem other than completely arbitrary and un-Friendly?

It is not about "claiming". If one has not verified the Friendly Thingy to
work as claimed, one does not set it in motion.

> Extrapolation of volition based on what we would want if we were very
> different beings than we are is even more likely to go far off the mark.

Hence the measurement of distance and spread.

> How can this possibly not diverge wildly into whatever the FAI (or
> whatever) forces to converge into simply what it believes would be
> "best"? This is unlikely to bear a lot of resemblance to what any
> actual humans want at any depth.

I have added PAQ 9 to address this:


Q9. How does the dynamic force individual volitions to cohere? (Frequently

A9. The dynamic doesn't force anything. The engineering goal is to ask what
humankind "wants", or rather would decide if we knew more, thought faster,
were more the people we wished we were, had grown up farther together, etc.
"There is nothing which humanity can be said to 'want' in this sense" is a
possible answer to this question. Meaning, you took your best shot at
asking what humanity wanted, and humanity didn't want anything coherent.
But you cannot force the collective volition to cohere by computing some
other question than "What does humanity want?" That is fighting alligators
instead of draining the swamp - solving an engineering subproblem at the
expense of what you meant the project to accomplish.

There are nonobvious reasons our volitions would cohere. In the Middle East
the Israelis hate the Palestinians and the Palestinians hate the Israelis;
in Belfast the Catholics hate the Protestants and the Protestants hate the
Catholics; India hates Pakistan and Pakistan hates India. But Gandhi loves
everyone, and Martin Luther King loves everyone, and so their wishes add up
instead of cancelling out, coherent like the photons in a laser. One might
say that love obeys Bose-Einstein statistics while hatred obeys Fermi-Dirac

Similarly, disagreements may be predicated on:

     * Different guesses to simple questions of fact. ("Knew more"
increases coherence.)
     * Poor solutions to cognitive problems. ("Think faster" increases
     * Impulses that would play lesser relative roles in the people we
wished we were.
     * Decisions we would not want our extrapolated volitions to include.

Suppose that's not enough to produce coherence? Collective volition, as a
moral solution, doesn't require some exact particular set of rules for the
initial dynamic. You can't take leave of asking what humanity "wants", but
it's all right, albeit dangerous, to try more than one plausible definition
of "want". I don't think it's gerrymandering to probe the space of
"dynamics that plausibly ask what humanity wants" to find a dynamic that
produces a coherent output, provided:

     * The meta-dynamic looks only for coherence, no other constraints.
     * The meta-dynamic searches a small search space.
     * The meta-dynamic satisfices rather than maximizes.
     * The dynamic itself doesn't force coherence where none exists.
     * A Last Judge peeks at the actual answer and checks it makes sense.

I would not object to an initial dynamic that contained a meta-rule along
the lines of: "Extrapolate far enough that our medium-distance wishes
cohere with each other and don't interfere with long-distance vetoes. If
that doesn't work, try this different order of evaluation. If that doesn't
work then fail, because it looks like humankind doesn't want anything."

Note that this meta-dynamic tests one quantitative variable (how far to
extrapolate) and one binary variable (two possible orders of evaluation),
and not, say, all possible orders of evaluation.

Forcing a confident answer is a huge no-no in FAI theory. If an FAI
discovers that an answering procedure is inadequate, you do not force it to
produce an answer anyway.


> The task you describe for the FRPOP is the tasks myths would have a
> fully enlightened, god-level, super-wise being attempt and only then
> with a lot of cautions. IMHO, attempting to do this with a un-sentient
> recursively self-improving process is the height of folly. It seems
> even more hubristic and difficult than the creation of a >human
> intelligent sentience. I don't see why you believe yourself incapable
> of the latter but capable of the former.

I don't know that I am incapable of creating a child. What I am presently
incapable of is understanding the implications, guessing what future
humankind will think of the act, whether they will call it child abuse, the
significance it would bear in history. Why am I worried about child abuse?
  If I have so little knowledge that I cannot even deliberately *not*
create a sentient being, then all I can do is probe around blindly.
Experience tells me that one who probes around blindly is likely to
massively screw up, which, if I succeeded, would be child abuse.

> Now you do back away from such implications somewhat by having volition
> extrapolation be only a first level working solution until "that
> voliton" evolves and/or builds something better. But what do you mean
> by "that volition" here? Is it the FRPOP, what the FRPOP becomes, the
> FRPOP plus humanity plus the extrapolated vision to date or what? It
> isn't very clear to me.

Very roughly speaking, it's the FRPOP refracting through humanity to
extrapolate what our decision would be in N years, for medium-distance
values of N, plus a long-distance veto.

> If the ruleset is scrutable and able to be understood by some, many,
> most humans then why couldn't humans come up with it?

We're not talking about legal regulations but apparent behaviors of Nature
(fixed in the original paper).

> I am not at all sure that our "collective volition" is superior to the
> very best of our relatively individual volition.

Best under what or whose standard of bestness? Everyone used to nag me
about this question, and while it was annoying that certain people smugly
thought it unanswerable, it still needed an answer. My professional answer
is, "the coherent output of an extrapolated collective volition". What's
your answer?

> Saying it is
> collective may make it sound more egalitarian, democratic and so on but
> may not have much to do with it actually being best able to guarantee
> human survival and well-being. It looks like you were getting toward
> the same thing in your 2+ days partial recanting/adjustment-wishes. I
> don't see how "referring the problem back to humanity" is all that
> likely to solve the problem. It might however be the best that can be
> done.
> I think I see that you are attempting to extrapolate beyond the present
> average state of humans and their self-knowledge/stated
> wishes/prejudices and so on to what they really, really want in their
> most whole-beyond-idealization core.

I wouldn't call it so much "really, really want" as "would want if we were
more grown up" - it's that ephemeral-seeming "grown up" part I'm trying to
give a well-specified definition. Strength of emotion is not sufficient,
though strength of emotion counts in the extrapolation.

> I just find it unlikely to near
> absurdity to believe any un-sentient optimizing process, no matter how
> recursively self-improving, will ever arrive there.

Unless the extrapolation is blocked by the requirement that the simulations
not be themselves sentient (PAQ 5), I disagree; we are matter and what
matter computes, other matter can approximate. The moral question is "What
are we trying to compute?" As for actually computing it, that is the
technical part.

> Where are sections on enforcement of conditions that keep humanity from
> destroying itself?

How am I supposed to know what a collective volition would do in that
department? Do I look like a collective volition? I can think of what
seem like good solutions, but so what?

> What if the collective volition leads to
> self-destruction or the destruction of other sentient beings?

Because that's what we want? Because that's what those other rotten
bastards want? Because the programmers screwed up the initial dynamic?

What if super-Gandhi starts letting the air out of car's tires? What if
toaster ovens grow wings? I need you to give me a reason for the failure;
it will always be possible to postulate the blank fact of failure without
giving a reason, and then shake one's finger at the evil collective
volition in that hypothetical scenario.

> But more
> importantly, what does the FAI protect us from and how is it intended
> to do so?

What's all this business about protecting? Is that what we "want"?

> Section 6 is very useful. You do not want to build a god but you want
> to enshrine the "true" vox populi, vox Dei. It is interesting in that
> the vox populi is the extrapolation of the volition of the people and
> in that manner a reaching for the highest within human desires and
> potentials. This is almost certainly the best that can be done by and
> for humans including those building an FAI. But the question is
> whether that is good enough to create that which to some (yet to be
> specified extent) enforces or nudges powerfully toward that collective
> volition. Is there another choice? Perhaps not.
> A problem is whether the most desirable volition is part of the
> collective volition or relatively rare. A rare individual or group of
> individuals' vision may be a much better goal and may perhaps even be
> what most humans eventually have as their volition when they get wise
> enough, smart enough and so on. If so then collective volition is not
> sufficient. Judgment of the best volition must be made to get the best
> result. Especially if the collective volition at this time is adverse
> to the best volition and if the enforcement of the collective volition,
> no matter how gentle, might even preclude the better volition. Just
> because being able to judge is hard and fraught with danger doesn't mean
> it is necessarily better not to do so.

Now added as PAQ 8:


Q8. A problem is whether the most desirable volition is part of the
collective volition or relatively rare. A rare individual or group of
individuals' vision may be a much better goal and may perhaps even be what
most humans eventually have as their volition when they get wise enough,
smart enough and so on. (SamanthaAtkins)

A8. "Wise enough, smart enough" - this is just what I'm trying to describe
specifically how to extrapolate! "What most humans eventually have as their
decision when they get wise enough, smart enough, and so on" is exactly
what their collective volition is supposed to guess.

It is you who speaks these words, "most desirable volition", "much better
goal". How does the dynamic know what is the "most desirable volition" or
what is a "much better goal", if not by looking at you to find the initial
direction of extrapolation?

How does the dynamic know what is "wiser", if not by looking at your
judgment of wisdom? The order of evaluation cannot be recursive, although
the order of evaluation can iterate into a steady state. You cannot ask
what definition of flongy you would choose if you were flongier. You can
ask what definition of wisdom you would choose if you knew more, thought
faster. And then you can ask what definition of wisdom-2 your wiser-1 self
would choose on the next iteration (keeping track of the increasing
distance). But the extrapolation must climb the mountain of your volition
from the foothills of your current self; your volition cannot reach down
like a skyhook and lift up the extrapolation.

It is a widespread human perception that some individuals are wiser and
kinder than others. Suppose our collective volition does decide to weight
volitions by wisdom and kindness - a suggestion I strongly dislike, for it
smacks of disenfranchisement. It would still take a majority vote of
extrapolated volitions for the initial dynamic to decide how to judge
"wisdom" and "kindness". I don't think it wise to tell the initial dynamic
to look to whichever humans judge themselves as wiser and kinder. And if
the programmers define their own criteria of "wisdom" and "kindness" into a
dynamic's search for leaders, that is taking over the world by proxy. (You
wouldn't want the al-Qaeda programmers doing that, right? Though it might
work out all right in the end, so long as the terrorists told the initial
dynamic to extrapolate their selected wise men.)

If we know that we are not the best, then let us extrapolate our volitions
in the direction of becoming more the people we wish we were, not
concentrate Earth's destiny into the hands of our "best". What if our best
are not good enough? We should need to extrapolate them farther. If so,
why not extrapolate everyone?


> The earth is NOT scheduled to go to "vanish in a puff of smiley faces".
> I very much do not agree that that is the only alternative to FAI.

I don't know the future, but I know the technical fact that it looks easier
than I would have hoped to build things that go FOOM, and harder than I
would have hoped to make something sorta nice, and most people aren't even
interested in rising to the challenge. Maybe the Earth isn't scheduled to
vanish in a puff of smiley faces, but it's my best guess.

> Q1's answer is no real answer. The answer today is we have no friggin'
> idea how to do this.

Speak for yourself, albeit today I only have specific ideas as to how to do
very simple operations of this kind. (Q1: How do you "extrapolate" what
we would think if we knew more, thought faster/smarter?) Not knowing how
to do something is a temporary state of mind that, in my experience, tends
to go away over time. Having no clue what you *want* to do is a much
severer problem, and so I call this progress.

> I am not into blaming SIAI and find it amazing you would toss this in.

The meaning of "blame" in that case was meant as deliberate irony. I have
changed and expanded PAQ 2; it now reads:


Q2. Removing the ability of humanity to do itself in and giving it a much
better chance of surviving Singularity is of course a wonderful goal. But
even if you call the FAI "optimizing processes" or some such it will still
be a solution outside of humanity rather than humanity growing into being
enough to take care of its problems. Whether the FAI is a "parent" or not
it will be an alien "gift" to fix what humanity cannot. Why not have
humanity itself recursively self-improve? (SamanthaAtkins)

A2. For myself, the best solution I can imagine at this time is to make
collective volition our Nice Place to Live, not forever, but to give
humanity a breathing space to grow up. Perhaps there is a better way, but
this one still seems pretty good. As for it being a solution outside of
humanity, or humanity being unable to fix its own problems... on this one
occasion I say, go ahead and assign the moral responsibility for the fix to
the Singularity Institute and its donors.

Moral responsibility for specific choices of a collective volition is hard
to track down, in the era before direct voting. No individual human may
have formulated such an intention and acted with intent to carry it out.
But as for the general fact that a bunch of stuff gets fixed, the
programming team and SIAI's donors are human and it was their intention
that a bunch of stuff get fixed. I should call this a case of humanity
solving its own problems, if on a highly abstract level.


> Q4's answer disturbs me. if there are "inalienable rights" it is not
> because someone or other has the opinion that such rights exists.

Precisely; inalienable rights are just those rights of which our opinion is
that the rights exist independently of our opinions.

Confusing, yes. It took my mighty intellect six years (1996-2002) to sort
out the tangle.

> It is because the nature of human beings is not utterly mutable

That's a moral judgment. Not a physical impossibility.

> and this
> fixed nature leads to the conclusion that some things are required for
> human well-functioning.

Required by what standard? Well-functioning by what standard? Why is
"required for human well-functioning" the source of inalienable rights? I
am not smugly calling the questions unanswerable, I am asking where to pick
up the information and the exact specifications.

> These things that are required by the nature
> of humans are "inalienable" in that they are not the made-up opinions of
> anyone or some favor of some governmental body or other.

Yes, that is why the *successor dynamic* might include a Bill of Rights
independent of the majority opinion, because the majority *felt* - not
decided, but felt - that these rights existed independently of the majority

> As such
> these true inalienable rights should grow straight out of Friendliness
> towards humanity.

Who defines Friendliness?

Okay, now I'm just being gratuitously evil. It's just that people have
been torturing me with that question for years, and when I saw the
opportunity to strike, I just had to do it; I'm sure you understand.
"Where does the information of Friendliness come from?" is the non-evil
form of the question.

Under the information flow of collective volition, an inalienable right
would arise from a majority perception like yours, that the right existed
independently of anyone's opinion. Hast thou a better suggestion?

> Your answer also mixes freely current opinions of the majority of human
> kind and actual collective volition. It seems rather sloppy.

Can you point out where the mixing occurs?

> That's all for now. I am taking a break from computers for the next 4
> days or so. So I'll catch up next weekend.

See ya. Thanks for dialoguing!

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT