From: Eliezer S. Yudkowsky (email@example.com)
Date: Wed May 18 2005 - 14:51:21 MDT
Sebastian Hagen wrote:
> Russell Wallace wrote:
>>I'm assuming the Sysop will _not_ know what constitutes a sentient
>>being, and we won't be able to formally define it either. This is the
>>big difference between domain protection and both of Eliezer's sysop
>>scenarios; I'm making more conservative assumptions about what will be
>>possible, and being more modest in the problems I try to solve.
>>For purposes of setting up the domains, the rule can be simple: each
>>and every human on Earth (the ones who are old enough to make a
>>choice, at least) gets to decide what domain they want to move to (or
> Who decides what "old enough" means? And why is biological age the
> critical metric, as opposed to, for example, a certain measure of
>>stay on Earth, of course); that's an operationally adequate definition
>>of "sentient" for that purpose.
> Even assuming it was, there would still be a lot of other important
> decisions to be made; for example:
> What domains with what rules are actually created? How are the available
> ressources divided between them? How is interdomain communication
> regulated? Do 'sentient beings' only get to choose their domain once,
> are they free to move between them at any time, or are parameters of the
> individual domains responsible for deciding who is allowed to enter or
> If you allow unupgraded humans to make final decisions about these
> matters, suboptimal to catastrophic results are likely.
> Keeping all of these settings adjustable into the indefinite future, on
> the other hand, would provide possibilities for attacking individual
> Do you have a better method of determining these parameters?
Sebastian Hagen has pinpointed the foundational flaw with the Domain
Protection proposal - the missed point, even.
Each and every question Sebastian asked is an additional decision that you
would allocate to the programmers. It is hence an additional point for the
programmers to potentially get wrong. Furthermore you propose no mechanism
for fixing things you get wrong. There are two things missing here that are
the core motivations behind Collective Extrapolated Volition: limited initial
complexity and self-healing. Although there are decisions that must be made
to define an initial dynamic of extrapolated volition, there's a bounded
number of such decisions and they tend to center around very fundamental
questions. Once you finish defining only this one thing of collective
extrapolated volition, you need not define the ground rules for an entire
sociopolitical system, as you are attempting to do. Defining a
socialpolitical system is a tempting failure because it is so much endless
fun. Every time you have a little more fun, you raise a few more questions to
potentially get wrong. The age of majority... laws of transit... minimum
group size to define an environment... communication between domains...
debts... bankruptcy... division of resources...
You are not being more conservative about what will be possible. You are not
being more conservative about what you are trying to solve. I very carefully
limited myself to trying to solve only one problem. It was a big problem, but
there was still only one of it, and it wasn't a fun political issue such as
humans love to get stuck in and underestimate the complexity of.
The second problem is that if you make up all your own rules for a
sociopolitical system - or pardon me, a framework for creating sociopolitical
domains and regulating interactions between them - then, even if you succeed
in creating a system that does what you told it to do, it is fixed in place.
If the consequences are not what you imagined, it is still fixed in place. It
does not self-heal. It does not self-correct. You have to get every single
thing exactly right on the first try because there is no mechanism to fix even
a single thing wrong. The Collective Extrapolated Volition proposal takes the
form it does because, if I succeed at CEV, it isn't a random sociopolitical
system, it's a mechanism for correcting errors including errors I made in the
initial specification of CEV, providing that the initial idea is not *perfect*
but rather *right enough*. I won't call it error-tolerant, but there would at
least be hope.
You say: "In any case, consider the arrows in state space: because you're
following a single volition, they all point to a single region of state space.
Again loss of diversity, and horrendous danger - all we can say about the
nature of that single region is that we don't know anything about it."
A CEV can implement a multi-domain region if that's what people want. The
Domain Protection system runs through a single volition, your own, pointing to
the single framework for the system; and that volition is considerably weaker
than a CEV. You're trying to achieve results, such as "protection of
diversity", by taking specific complex actions, such as a domain framework,
that you think will have the consequence of protecting diversity. But your
volition is too weak to firmly state the expected consequences of your actions
- even in this our human world, let alone the full spread of possibilities for
the next million years.
Let's review the Domain Protection according to the seven motivations listed
in the original "Collective Volition" page. You may wish to refer to the
1. Defend humans, the future of humankind, and humane nature.
* I don't see anything in the DP framework that covers any of those three
things, except possibly "defend humans" - but even that would depend on the
specific domains created, or the mechanism for creating domains. If all
domains are wildernesses in which death is permanent, that is not "defend
humans". Since you didn't specify the exact mechanism which determines which
domains are created, I don't know what kind of domains will be created. Since
I don't know what kind of domains will be created, I don't know what will
happen to the people in them. As for defending the 'future of humankind'
(which in CEV is dealt with by trying to extrapolate the spectrum of probable
futures and our reactions to them), it's not clear how any progress at all is
made on using AI superintelligence to guard against dangers we did not know;
you propose a fixed unalterable structure that you think might deal with a
small set of dangers to humanity that you foresee, such as 'loss of
diversity'. With "humane nature" the proposal contains nothing that would
tend to solve the chicken-and-egg problem of humans who are not wise enough to
upgrade themselves upgrading themselves to where they are wise enough to
upgrade themselves. CEV is an attempt to get the best definition we can of
that-which-we-wish-to-preserve-through-the-transition that isn't limited to
our present-day abilities to define it, and returns deliberately spread-out
answers in the case our possible decisions aren't strongly concentrated.
2. Encapsulate Moral Growth
* Domain Protection doesn't do this at all. Period. You say "AI smarter than
human is on much firmer theoretical ground than AI wiser than human", and in
one sense this is true, because today I have an understanding verging on the
truly technical for "smarter than human", whereas my apprehension of "wiser
than human" is roughly where my apprehension of "smarter than human" was in
2000 or thereabouts. But I didn't just poof into technical understanding; I
got there by steadily gnawing away on a vague understanding. And, no offense,
but unless it's what you plan to do with your whole life, I doubt that you
have any firm theoretical ground with which to describe "AI smarter than
human". And then you go on to confidently declare, "Smarter yes, wiser and
more benevolent no; it's a Godel problem, you can't do better than your
axioms." Wow! Where'd you get the firm theoretical ground from to
confidently declare this problem unsolvable? Didn't you just get through
saying you didn't have one? A True AI Researcher is someone who can take
ill-defined comparisons like "humans are smarter than chimps" or "Einstein was
smarter than a village idiot" or "Leo Szilard was more altruistic than Adolf
Hitler" and cash out the ill-defined terms like "smarter" or "wiser and more
benevolent" into solid predicates and creatable technology. Lots of people
will tell you that smartness or even intelligence is forever and eternally
undefinable, and that Artificial Intelligence is therefore unworkable. In
fact I just recently had to argue about that with Jaron Lanier.
3. Humankind should not spend the rest of eternity desperately wishing that
the programmers had done something differently. "This seems obvious, until
you realize that only the Singularity Institute has even tried to address this
issue. I haven't seen a single other proposal for AI morality out there, not
even a casual guess, that takes the possibility of Singularity Regret into
account. Not one. Everyone has their brilliant idea for the Four Great Moral
Principles That Are All We Need To Program Into AIs, and not one says, "But
wait, what if I got the Four Great Moral Principles wrong?" They don't think
of writing any escape clause, any emergency exit if the programmers made the
wrong decision. They don't wonder if the original programmers of the AI might
not be the wisest members of the human species; or if even the wisest
human-level minds might flunk the test; or if humankind might outgrow the
programmers' brilliantly insightful moral philosophy a few million years hence."
* I didn't see *anywhere* in your Domain Protection page where you talked
about a revocation mechanism and a framework for controlled transition to
something else. Don't tell me it was a minor oversight.
4. Avoid hijacking the destiny of humankind.
* You aren't proposing to define the domains yourself (as far as I can tell),
so you aren't being a jerk. You could end up destroying the destiny of
humankind but you don't appear to be trying to hijack it per se.
5. Avoid creating a motive for modern-day humans to fight over the initial
* Along comes an AI project that wants to define the minimum age to move
freely between domains as 13, instead of 18, in according with the laws of the
Torah. How do the two of you settle your differences? Would you advise the
al-Qaeda programmers to make as many decisions (irrevocable decisions!) as you
wish to allocate to yourself?
6. Keep humankind ultimately in charge of its own destiny.
* All descendants of humankind are subject to your initial choices in defining
the exact form of the domain interaction framework, even a million years
hence. However you do not propose to create a god. Mixed score.
7. Help people.
* DP suggests no Last Judge or equivalent, and no stated dependency on
people's actual reactions or extrapolated judgments. There's no mechanism for
revoking DP if its consequences turn out to be horrible.
If you'd proposed a revocation mechanism, it would have shown more moral
caution than any other non-SIAI AI morality proposal up until this point. But
it still wouldn't be enough, not nearly.
...I wish people understood just how non-arbitrary the CEV proposal is. If
you understand all the motivations here that must needs be satisfied, you will
see how really difficult it is to come up with anything that works half as
well, let alone better.
Think meta, and keep thinking. Let me worry about what's technically
impossible. Try to say what you care about, what you want as the consequence,
not what means you think will achieve the consequence. Both judgments are
fallible, but the latter is considerably more fallible.
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT