Re: Domain Protection

From: Russell Wallace (russell.wallace@gmail.com)
Date: Wed May 18 2005 - 19:14:50 MDT


On 5/18/05, Eliezer S. Yudkowsky <sentience@pobox.com> wrote:
[feedback on my proposal, thanks]

...Okay, it seems to me you claim that Domain Protection is inferior
to Collective Volition in the following ways; correct me if I'm
misstating or omitting any:

1. It has more moving parts: more things to be specified, more things
to go wrong.

No, it only seems that way because I'm giving specific answers to
things you're sweeping under the carpet. For all that your paper was
longer than mine (and indeed better written; frankly, I was tired and
in a hurry when I wrote mine, because unlike you I don't get paid for
writing about AI; but I've tried to clarify anything that wasn't
initially clear), it's a lot less specific about details.

> The age of majority... laws of transit... minimum
> group size to define an environment... communication between domains...
> debts... bankruptcy... division of resources...

...are all decisions that have to be made, whichever proposal is
followed. I've suggested how to make them (in most cases by people
other than myself, see my earlier replies in this thread); you
haven't.

> You are not being more conservative about what will be possible. You are not
> being more conservative about what you are trying to solve. I very carefully
> limited myself to trying to solve only one problem. It was a big problem, but
> there was still only one of it, and it wasn't a fun political issue such as
> humans love to get stuck in and underestimate the complexity of.

It only looks like one problem because you've found a sentence in
English that describes it as one. In fact you're trying to solve
essentially _all_ the hardest problems simultaneously, and doing it in
one big lump with everything including the kitchen sink thrown in, and
throwing a carpet of English words over the lump to obscure the
details so they can't be seen well enough to have a chance of solving
any of them.

In real life, political issues have to be dealt with, however much us
geeks wish it weren't so. I've tried to analyze them and boil them
down to a bare minimum that just might have a prayer of being
solvable. I can't promise my solution will work - but it has a much
better chance than the head-in-the-sand approach.

2. You're better qualified than me to be thinking about this stuff.

Well, you have a high opinion of your abilities. Cool; I have a high
opinion of my abilities too. The proof of both those puddings will be
in the eating. You think certain things are feasible, I think they
aren't; neither one of us has mathematical proof, nor yet a line of
working AI code to serve as experimental evidence, so right now all we
can say for sure is that your intuition differs from mine (and believe
me, mine was no more casually arrived at than yours).

> And then you go on to confidently declare, "Smarter yes, wiser and
> more benevolent no; it's a Godel problem, you can't do better than your
> axioms." Wow! Where'd you get the firm theoretical ground from to
> confidently declare this problem unsolvable? Didn't you just get through
> saying you didn't have one?

Of course, and neither have you (or if you have, there's no sign of it
in what you've published). If you can come up with proof that you're
right, great!

But what if you're wrong?

3. It doesn't have certain features you want.

Okay, I'll address these in more detail below.

> Let's review the Domain Protection according to the seven motivations listed
> in the original "Collective Volition" page. You may wish to refer to the
> original discussion:
>
> http://www.intelligence.org/friendly/collective-volition.html#motivations

Okay.

> 1. Defend humans, the future of humankind, and humane nature.
>
> * I don't see anything in the DP framework that covers any of those three
> things, except possibly "defend humans" - but even that would depend on the
> specific domains created, or the mechanism for creating domains. If all
> domains are wildernesses in which death is permanent, that is not "defend
> humans". Since you didn't specify the exact mechanism which determines which
> domains are created, I don't know what kind of domains will be created. Since
> I don't know what kind of domains will be created, I don't know what will
> happen to the people in them.

See my replies earlier in the thread for a suggested mechanism for
creating domains. Defending the future of humankind and humane nature
is the primary purpose of Domain Protection, and it is addressed by
creating domains in which they are protected, where boundaries are
defined in state space that confine the domain to a region in which
humanity exists.

> As for defending the 'future of humankind'
> (which in CEV is dealt with by trying to extrapolate the spectrum of probable
> futures and our reactions to them), it's not clear how any progress at all is
> made on using AI superintelligence to guard against dangers we did not know;
> you propose a fixed unalterable structure that you think might deal with a
> small set of dangers to humanity that you foresee, such as 'loss of
> diversity'.

You're missing the other point of diversity - it's the only way to
deal with dangers we didn't foresee. If you only have one domain, as
CEV proposes, then when an unforeseen danger comes up that kills that
domain (and I've already pointed at least one out that you didn't
foresee - who's to say there aren't others that neither of us has
foreseen?), it's goodnight forever. Having many domains gives the best
chance of at least some surviving.

> With "humane nature" the proposal contains nothing that would
> tend to solve the chicken-and-egg problem of humans who are not wise enough to
> upgrade themselves upgrading themselves to where they are wise enough to
> upgrade themselves.

Nor does it preclude this. If you think you have a solution, by all
means get 99 like-minded folk together and create a domain where
you're free to upgrade yourselves to your heart's content. Meanwhile
if you get it wrong, you've only killed yourselves and not all of
humanity.

> CEV is an attempt to get the best definition we can of
> that-which-we-wish-to-preserve-through-the-transition that isn't limited to
> our present-day abilities to define it

Whereas DP doesn't rely that we gamble on one such definition being correct.

> and returns deliberately spread-out
> answers in the case our possible decisions aren't strongly concentrated.

What if they're strongly concentrated in the wrong place? What if
they're not concentrated but decisions still have to be made?

> 2. Encapsulate Moral Growth
>
> * Domain Protection doesn't do this at all. Period.

Yep, it doesn't aspire to being a theory of encapsulating moral
growth. Nor does it preclude such; again, if you think you have a
solution to this, by all means create a domain where you can implement
your proposed solution.

> 3. Humankind should not spend the rest of eternity desperately wishing that
> the programmers had done something differently. "This seems obvious, until
> you realize that only the Singularity Institute has even tried to address this
> issue. I haven't seen a single other proposal for AI morality out there, not
> even a casual guess, that takes the possibility of Singularity Regret into
> account. Not one. Everyone has their brilliant idea for the Four Great Moral
> Principles That Are All We Need To Program Into AIs, and not one says, "But
> wait, what if I got the Four Great Moral Principles wrong?" They don't think
> of writing any escape clause, any emergency exit if the programmers made the
> wrong decision. They don't wonder if the original programmers of the AI might
> not be the wisest members of the human species; or if even the wisest
> human-level minds might flunk the test; or if humankind might outgrow the
> programmers' brilliantly insightful moral philosophy a few million years hence."
>
> * I didn't see *anywhere* in your Domain Protection page where you talked
> about a revocation mechanism and a framework for controlled transition to
> something else. Don't tell me it was a minor oversight.

Of course it wasn't an oversight, it's the core feature! Part of the
point of having multiple domains is that humanity does _not_ have to
trust me or you or anyone to get things right. If you think I'm wrong
about the way I want to live for the next billion years, go to a
domain where your way applies.

The point of DP is that you can turn yourself into a deranged
superintelligence in the name of a misguided theory of moral growth
and then go on to destroy everything of value in _your_ domain if you
want to - you just don't ever get to destroy _my_ domain, neither
tomorrow nor in a billion years, no matter what arguments you come up
with as to why you should be allowed to do so.

But you don't think your theory of moral growth will be misguided, and
you're not planning to become deranged or destroy everything of value?
Of course. How do you know your proposed upgrade program won't have
that result anyway? You don't. We have two choices:

1) Gamble the survival of all humanity on _one_ path being right...
and keep gambling until the dice go against us.

2) Lay down in stone that some things are protected forever, no matter
what else goes wrong.

CEV proposes the first option, DP proposes the second.

> 4. Avoid hijacking the destiny of humankind.
>
> * You aren't proposing to define the domains yourself (as far as I can tell),
> so you aren't being a jerk. You could end up destroying the destiny of
> humankind but you don't appear to be trying to hijack it per se.

*nods* CEV and DP both meet this criterion; I'm inclined to agree with
a comment you made once, that having a shot at designing an AI that
wouldn't just turn everything in reach into grey goo, requires
qualities of mind that are at least unlikely to be compatible with not
having risen above that level.

> 5. Avoid creating a motive for modern-day humans to fight over the initial
> dynamic.

Right, DP meets this criterion far better than CEV does.

> * Along comes an AI project that wants to define the minimum age to move
> freely between domains as 13, instead of 18, in according with the laws of the
> Torah. How do the two of you settle your differences?

That's easy; they get to define a domain where 13 year olds are free
to emigrate, other people get to define one where you have to be 18 to
be allowed to emigrate.

> Would you advise the
> al-Qaeda programmers to make as many decisions (irrevocable decisions!) as you
> wish to allocate to yourself?

I am proposing to make very few decisions myself - considerably fewer
and less consequential ones than you are, in fact. In DP, the al-Qaeda
guys can create a domain where the Koran is enforced to the letter or
whatever if that's what they want. Under CEV, because everyone is
forced down the same path, if al-Qaeda think they are outnumbered they
have to take a truck bomb to the Singularity Institute or see their
philosophy extinguished.

> 7. Help people.
>
> * DP suggests no Last Judge or equivalent, and no stated dependency on
> people's actual reactions or extrapolated judgments. There's no mechanism for
> revoking DP if its consequences turn out to be horrible.

What if the consequences of having a mechanism to revoke DP would turn
out to be horrible?

But DP is compatible with having a Last Judge; if you can come up with
a coherent proposal for a mechanism by which the information provided
to the Judge would have significant causal dependence on something
other than the prejudices of the programmer, perhaps that should be
added.

> ...I wish people understood just how non-arbitrary the CEV proposal is. If
> you understand all the motivations here that must needs be satisfied, you will
> see how really difficult it is to come up with anything that works half as
> well, let alone better.

Oh, I understand your motivations, perhaps better than you do
yourself; they are similar to mine before I learned to add caution and
humility into the mix. Still, we probably have time for you to learn
more caution.

> Think meta, and keep thinking. Let me worry about what's technically
> impossible. Try to say what you care about, what you want as the consequence,
> not what means you think will achieve the consequence. Both judgments are
> fallible, but the latter is considerably more fallible.

As I remarked above, statements to the effect that other researchers
shouldn't worry their pretty little heads about technical things are
all the better for proof ^.~ But here's what I care about:

1. Safety

Above all else, to preserve humanity (not so much the protein and DNA
hardware technology, but our values and modes of experience). DP is
the best way I can think of to do that.

CEV fails here; I've already pointed out ways in which it's liable to
exterminate all sentient life, and there are probably others I haven't
thought of.

Here's another: doesn't CEV require the AI to unilaterally take over
the world (so that it can start doing what it thinks we ought to want,
rather than what we actually want)? If not, how are you proposing to
get it accepted? If so, have you considered the issues of feasibility
and safety involved?

2. Diversity

CEV proposes that every man, woman and child is forced down the same
path, whatever their feelings on the matter. That's _not_ a good
starting point for diversity.

3. Fairness

Apart from the little taking over the world problem above, there's the
problem of forcing everyone down the same path again - which means
minority groups are faced with the prospect of having to fight you or
be extinguished.

"If all mankind minus one were of one opinion, mankind would be no
more justified in silencing that one person than he, if he had the
power, would be justified in silencing mankind." - John Stuart Mill

- Russell



This archive was generated by hypermail 2.1.5 : Tue Feb 21 2006 - 04:22:56 MST