Re: ethics

From: Eliezer S. Yudkowsky (
Date: Wed May 19 2004 - 16:05:29 MDT

fudley wrote:

> On Wed, 19 May 2004 01:42:12 -0400, "Eliezer S. Yudkowsky"
> <> said:
>>The problem is: Sea Slugs can't do abstract reasoning!
> Well, Sea Slugs can respond to simple external stimuli but I agree they
> have no understanding of Human Beings, just as a Human Being can have no
> understanding of the psychology of a being with the brain the size of a
> planet.

This was my ancient argument, and it turned out to be a flawed metaphor -
the rule simply doesn't carry over. If you have no understanding of the
psychology of a being with the brain the size of a planet, how do you know
that no human can understand its psychology? This sounds like a flip
question, but it's not; it's the source of my original mistake - I tried
to reason about the incomprehensibility of superintelligence without
understanding where the incomprehensibility came from, or why. Think of
all the analogies from the history of science; if something is a mystery
to you, you do not know enough to claim that science will never comprehend
it. I was foolish to make statements about the incomprehensibility of
intelligence before I understood intelligence.

Now I understand intelligence better, which is why I talk about
"optimization processes" rather than "intelligence".

The human ability to employ abstract reasoning is a threshold effect that
*potentially* enables a human to fully understand some optimization
processes, including, I think, optimization processes with arbitrarily
large amounts of computing power. That is only *some* optimization
processes, processes that flow within persistent, humanly understandable
invariants; others will be as unpredictable as coinflips.

Imagine a computer program that outputs the prime factorization of large
numbers. For large enough numbers, the actual execution of the program
flow is not humanly visualizable, even in principle. But we can still
understand an abstract property of the program, which is that it outputs a
set of primes that multiply together to yield the input number.

Now imagine a program that writes a program that outputs the prime
factorization of large numbers. This is a more subtle problem, because
there's a more complex definition of utility involved - we are looking for
a fast program, and a program that doesn't crash or cause other negative
side effects, such as overwriting other programs' memory. But I think
it's possible to build out an FAI dynamic that reads out the complete set
of side effects you care about. More simply, you could use deductive
reasoning processes that guarantee no side effects. (Sandboxing a Java
program generated by directed evolution is bad, because you're directing
enormous search power toward finding a flaw in the sandboxing!) Again,
the exact form of the generated program would be unpredictable to humans,
but its effect would be predictable from understanding the optimization
criteria of the generator; a fast, reliable factorizer with no side effects.

A program that writes a program that outputs the prime factorization of
large numbers is still understandable, and still not visualizable.

The essential law of Friendly AI is that you cannot build an AI to
accomplish any end for which you do not possess a well-specified
*abstract* description. If you want moral reasoning, or (my current
model) a dynamic that extrapolates human volitions including the
extrapolation of moral reasoning, then you need a well-specified abstract
description of what that looks like.

In summary: You may not need to know the exact answer, but you need to
know an exact question. The question may generate another question, but
you still need an exact original question. And you need to understand
everything you build well enough to know that it answers that question.

>>Thus making them impotent to control optimization processes such as
>>Humans, just like natural selection, which also can't do abstract reasoning.
> But if the “optimization processes” can also do abstract reasoning things
> become more interesting; it may reason out why it always rushes to aid a
> Sea Slug in distress even at the risk of its own life, and it may reason
> that this is not in its best interest, and it may look for a way to
> change things.

Don't put "optimization processes" in quotes, please. Your question
involves putting yourself into an FAI's shoes, and the shoes don't fit,
any more than the shoes of natural selection would fit. You may be
thinking that "intelligences" have self-centered "best interests". Rather
than arguing about intelligence, I would prefer to talk about optimization
processes, which (as the case of natural selection illustrates) do not
even need to be anything that humans comprehend as a mind, let alone
possess self-centered best interests.

Optimization processes direct futures into small targets in phase space.
A Sea-Slug-rescuing optimization process, say a Bayesian decision system
controlled by a utility function that assigns higher utility to Sea Slugs
out of distress than Sea Slugs in distress, doesn't have a "self" or a
"best interest" as you know it. Put too much power behind the
optimization process, and unless it involves a full solution to the
underlying Friendly AI challenge, it may overwrite the solar system with
quintillions of tiny toy Sea Slugs, just large enough to meet its
criterion for "undistressed Sea Slug", and no larger. But it still won't
be "acting in its own self-interest". That was just the final state the
optimization process happened to seek out, given the goal binding. As for
it being unpredictable, why, look, here I am predicting it. It's only
unpredictable if you close your eyes and walk directly into the whirling
razor blades. This is a popular option, but not a universally admired one.

>>That part about "Humans were never able to figure out a way to overcome
>>them" was a hint, since it implies the Humans, as an optimization process,
>>were somehow led to expend computing power specifically on searching for a
>>pathway whose effect (from the Sea Slugs' perspective) was to break the rules.
> The only thing that hint is telling you is that sometimes a hugely
> complicated program will behave in ways the programmer neither expected
> or wanted, the more complex the program the more likely the surprise, and
> we’re talking about a brain the size of a planet.

Humans weren't generated by computer programmers. Our defiance of
evolution isn't an "emergent" result of "complexity". It's the result of
natural selection tending to generate psychological goals that aren't the
same as natural selection's fitness criterion.

An FAI ain't a "hugely complicated program", or at least, not as
programmers know it. In the case of a *young* FAI, yeah, I expect
unanticipated behaviors, but I plan to detect them, and make sure that not
too much power goes into them. In the case of a mature FAI, I don't
expect any behaviors the FAI doesn't anticipate.

"Emergence" and "complexity" are explanations of maximum entropy; they
produce the illusion of explanation, yet are incapable of producing any
specific ante facto predictions.

"Emergence" == "I don't understand what, specifically, happened."
"Complexity" == "I don't know how to describe the system, but it sure is

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:46 MDT