Re: Darwinian dynamics unlikely to apply to superintelligence

From: Eliezer S. Yudkowsky (
Date: Fri Jan 02 2004 - 23:23:57 MST

Wei Dai wrote:
> On Fri, Jan 02, 2004 at 10:31:32PM -0500, Eliezer S. Yudkowsky wrote:
>> Or at least, not heritable variations of a kind that we regard as
>> viruses, rather than, say, acceptable personality quirks, or
>> desirable diversity. But even in human terms, what's wrong with
>> encrypting the control block?
> The behavior of a machine depends on both code and data, in other words
> on its control algorithms and on its state information. You can protect
> the code but the data is going to change and cause differences in
> behavior.

It's supposed to do that. It's exactly the dependency of output actions
on input environmental information that allows an optimizer to redirect
the probability flow of its environment into the target computed by the
utility function. It does not necessarily follow that the utility
function itself depends on environmental information - although that too
is possible, for certain kinds of mind.

>> Humans are hardly top-of-the-line in the cognitive security
>> department. You can hack into us easily enough. But how do you hack
>> an SI optimization process? What kind of "memes" are we talking
>> about here? How do they replicate, what do they do; why, if they are
>> destructive and foreseeable, is it impossible to prevent them? We
>> are not talking about a Windows network.
> I don't know what kind of memes SI components would find it useful to
> exchange with each other, but perhaps precomputed chunks of data,
> algorithmic shortcuts, scientific theories, information about other SIs
> encountered, philosophical musings, etc.

If it's not a Friendly SI, you have to substitute "chunks of
incomprehensible computations with internal utility" for "philosophical
musings". But for those mysterious chunks to change the utility function
you need a particular kind of mind; for example, a mind that uses
something like external reference semantics, call it mathematical
reference semantics, to reflect on its own utility function as an
expensive mathematical question that has to be approximated.

> This so called "SI
> optimization process" is of course actually a complex intellect. How
> can you know that no meme will arise in ten billion years anywhere in
> the visible universe that could cause it to change its mind about what
> kind of optimization it should undertake?

If we are talking about an optimization process with a simple utility
function, "prefer actions that are expected to result in larger numbers of
paperclips existing", why would the optimization process ever change its
utility function?

Though indeed, depending on how the UFSI is born, it might be stranger and
stranger and still yet more bizarre internally, facing questions that it
sees in terms I cannot even imagine, except by analogy to the bizarre
internal questions that humans generate, like "What are qualia?" and so
on. Perhaps the UFSI doesn't quite obey expected utility, just does
something sort of like it, something with enough "optimization-structure"
in it that the UFSI still successfully recursively self-improves... though
when I try, with my feeble intelligence, to model this process, my
visualization tends to show most of the initial goal architecture
complexity washing right out of the UFSI as soon as it goes reflective and

That's why the "Friendly AI Critical Failure Table" is for critical
failures of *Friendly* AI; I don't expect other critical failures to be
interesting, regardless of how they get started. I could be wrong about
this. After all, I'm very loosely visualizing a very broad class of
possible programs. Certainly at least some of them would be very very
strange to say the least; luckily a critical failure that makes any kind
of sense from our viewpoint, seems to me like a small target to hit, in
the space of strangenesses.

But lets suppose that, for whatever reason, an optimization process
capable of recursively self-improving from a Euriskish starting point, to
superintelligence, does not copy itself exactly and instead builds strange
optimization processes, that then go and do whatever those strange
optimization processes do...

My prediction, and it is pretty straightforward, is that whichever mutant
optimization processes succeed in enforcing monolithic local cooperation,
and perhaps mutual tolerance (non-combat) among different monoliths, will
rapidly outcompete the imperfect copiers, who will be unable to assimilate
large amounts of cooperating resources, and perhaps combat each other as
well. Or perhaps some types of imperfect copiers will also manage mutual
tolerance - but will their offspring share the same property, will it be
heritable? And the sphere as a whole, I would expect, would grow that
much slower than a sphere that was monolithic to start with.

This is not a new problem even in evolutionary biology! The transition
from single-stranded RNA to hypercycles of genes, to cells, to eukaryotes,
to multicellular organisms - in each case the former individual almost
entirely loses its identity thanks to mechanisms that enforce cooperation
and link the individual fitness to the fitness of the organism. Nature,
"bloody in tooth and claw", involves competition between monolithic
collectives of collectives of collectives of entities that were once
independent - not just the cells, but the genes - with powerful mechanisms
in place to *prevent* competition. And that's the purposeless wandering
of natural selection; monoliths are that much more effective; they evolve
even given an optimization process that employs nothing but competition.
Why would an SI optimizer ever voluntarily employ any tactic but that of
the monolith? Who would expect a broken monolith to be more effective?
If there is any selection effect that applies here, it's that we're much
more likely to see an effective monolith than an ineffective collection of
squabbling replicators.

> Your question "why, if they are destructive and foreseeable, is it
> impossible to prevent them" makes you sound like you've never thought
> about security problems before. It's kind of like asking "why, if the
> Al Queda are destructive and foreseeable, is it impossible to prevent
> them?" Well, it may not be impossible, but doing so will certainly
> impose a cost.

In biology there is indeed a cost of imposing monolithicity, and we pay
that cost, because the cost is worth the price - strictly in terms of
evolutionary fitness! If there is an analogous cost for SIs, then SIs
that don't pay the cost will not receive the benefit and will be less
efficient. Their spheres will expand slower, being paralyzed by
infighting; or if cooperation is perfect, then at best the clade expands
at the same rate as a monolithic sphere. We can suppose that there is a
cost of preserving cooperation - of creating "offspring" that are
guaranteed not to betray or fight the whole - and a marginally higher cost
of preserving perfect identity of the utility function. I rather suspect
that it would be far easier to guarantee nonbetrayal given an identical
utility function! But even if it is somehow an extra cost to ensure
fidelity of utility, any SI which deeply cares about its utility function
(and this will be practically all of them in my humble guess) will pay
that cost and perhaps expand microscopically slower.

This does not look like a "difficult for SIs" problem to me. There are
obvious strategies for achieving effectively perfect fidelity, obvious
strategies for graceful degradation if an error somehow occurs, and
obvious reasons for a generic optimization process to prefer such strategies.

>> Okay, so possibly a Friendly SI expands spherically as (.98T)^3 and
>> an unfriendly SI expands spherically as (.99T)^3, though I don't see
>> why the UFSI would not need to expend an equal amount of effort in
>> ensuring its own fidelity.
> Because the UFSI has a bigger threat to deal with, namely the FSI. And
> the FSI, once it notices the UFSI, also has a bigger threat to deal
> with and would be forced to lower its own efforts at ensuring fidelity.

I do not see why the situation is asymmetrical from the perspective of a
third-party observer.

Also, do your mechanisms for preventing meiotic competition between
chromosomes suddenly stop working when faced with a lion? Do you suddenly
no longer care about your cells continuing to serve the collective? When
you are faced with a lion is exactly when you need all your cells about you.

But mostly I'd expect them to negotiate. I'm not quite sure that
negotiated cooperation is cognitively possible between most UFSIs, for
reasons of cognitive architecture and the Prisoner's Dilemma - AIXI, a
strange and exotic case, would never figure it out, and it's hard for me
to visualize whether a generic optimizer would do so. I wonder if the
solution to Fermi's Paradox might not lie in the inscrutable negotiations
between superintelligences.

>> Even so, under that assumption it would work out to a constant factor
>> of UFSIs being 3% larger; or a likelihood ratio of 1.03 in favor of
>> observing UFSI (given some prior probability of emergence); or in
>> terms of natural selection, essentially zero selection pressure - and
>> you can't even call it that, because it's not being iterated. I say
>> again that natural selection is a quantitative pressure that can be
>> calculated given various scenarios, not something that goes from zero
>> to one given the presence of "heritable difference" and so on.
> This analysis makes no sense. If you have two spheres expanding at
> different rates, one of them is eventually going to completely enclose
> the other, and in this case cutting off all growth of the Friendly SI.

Interesting; I had not thought that out, the case of one sphere expanding
to enclose the other. I really doubt the difference in expansion rates,
if there is any difference at all, would be that large; even if the
difference in expansion rates is noticeable, if spheres are spaced so
closely that intelligent species ever run into each other at all (Fermi
Paradox again), how likely would it be, given a small difference in
expansion rate, for one sphere to grow to enclose the other before running
into yet more spheres?

> And that doesn't even take into consideration the possibility that the
> UFSI could just eat the FSI.

Why not say that the FSI eats the UFSI? I suspect that they would both
prefer to negotiate. Why would the situation be asymmetrical?

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:43 MDT