Re: ethics

From: Samantha Atkins (
Date: Thu May 20 2004 - 00:19:30 MDT

On May 19, 2004, at 3:05 PM, Eliezer S. Yudkowsky wrote:

> fudley wrote:
>> On Wed, 19 May 2004 01:42:12 -0400, "Eliezer S. Yudkowsky"
>> <> said:
>>> The problem is: Sea Slugs can't do abstract reasoning!
>> Well, Sea Slugs can respond to simple external stimuli but I agree
>> they
>> have no understanding of Human Beings, just as a Human Being can have
>> no
>> understanding of the psychology of a being with the brain the size of
>> a
>> planet.
> This was my ancient argument, and it turned out to be a flawed
> metaphor - the rule simply doesn't carry over. If you have no
> understanding of the psychology of a being with the brain the size of
> a planet, how do you know that no human can understand its psychology?

Because of what we do know about the limits of human cognitive

> This sounds like a flip question, but it's not; it's the source of my
> original mistake - I tried to reason about the incomprehensibility of
> superintelligence without understanding where the incomprehensibility
> came from, or why. Think of all the analogies from the history of
> science; if something is a mystery to you, you do not know enough to
> claim that science will never comprehend it. I was foolish to make
> statements about the incomprehensibility of intelligence before I
> understood intelligence.

You have also made statements that I have not seen you repudiate about
the necessity of a sufficiently powerful intelligence to tackle various
sizes of problems. I have not yet seen an argument that makes be
believe that fully dependably limiting the choice space of a SAI to
preserve human existence is within the range of human mental abilities.
  For that matter, I have no strong reason to believe the problem is
solvable in principle.

> Now I understand intelligence better, which is why I talk about
> "optimization processes" rather than "intelligence".

But doesn't this sort of talk only work if the supergoal remains
invariant and is not subject to reflection and possible "optimization"
itself? Is this a form of begging the question?

> The human ability to employ abstract reasoning is a threshold effect
> that *potentially* enables a human to fully understand some
> optimization processes, including, I think, optimization processes
> with arbitrarily large amounts of computing power.

Precisely. "I think", not "I know".

> That is only *some* optimization processes, processes that flow within
> persistent, humanly understandable invariants; others will be as
> unpredictable as coinflips.

Yes, good..

> Imagine a computer program that outputs the prime factorization of
> large numbers. For large enough numbers, the actual execution of the
> program flow is not humanly visualizable, even in principle. But we
> can still understand an abstract property of the program, which is
> that it outputs a set of primes that multiply together to yield the
> input number.

This is a grossly inadequate analogy to producing a super-intelligent

> Now imagine a program that writes a program that outputs the prime
> factorization of large numbers. This is a more subtle problem,
> because there's a more complex definition of utility involved - we are
> looking for a fast program, and a program that doesn't crash or cause
> other negative side effects, such as overwriting other programs'
> memory. But I think it's possible to build out an FAI dynamic that
> reads out the complete set of side effects you care about. More
> simply, you could use deductive reasoning processes that guarantee no
> side effects. (Sandboxing a Java program generated by directed
> evolution is bad, because you're directing enormous search power
> toward finding a flaw in the sandboxing!) Again, the exact form of
> the generated program would be unpredictable to humans, but its effect
> would be predictable from understanding the optimization criteria of
> the generator; a fast, reliable factorizer with no side effects.

The analogy again breaks down. You are not talking about creating
algorithms that create other algorithms within design constraints. You
are talking of creating algorithms that become capable of creating
algorithms for any purpose the algorithmic entity desires to any depth
and with the entity capable of full examination and modification of the
very roots of its "desires". Otherwise you are only creating
something that super-efficiently optimizes a few goals you begin with.
  That you cannot predict its execution space means that this
optimization may well produce results that are not what you wished that
you did not think of beforehand and thus did not sufficiently instruct
the system to self-detect and avoid.

> A program that writes a program that outputs the prime factorization
> of large numbers is still understandable, and still not visualizable.

The same would be true if you evolved such a program btw.

> The essential law of Friendly AI is that you cannot build an AI to
> accomplish any end for which you do not possess a well-specified
> *abstract* description. If you want moral reasoning, or (my current
> model) a dynamic that extrapolates human volitions including the
> extrapolation of moral reasoning, then you need a well-specified
> abstract description of what that looks like.

Can the AI itself decide to build an AI for such an end? Are you
sure the finite space of "abstract" descriptions we can generate and
verify includes the solutions we seek? I'm not. I guess what you say
is true enough for some values of "abstract".

> In summary: You may not need to know the exact answer, but you need
> to know an exact question. The question may generate another
> question, but you still need an exact original question. And you need
> to understand everything you build well enough to know that it answers
> that question.

Not always possible. Sometimes, especially if the question is pressing
enough, you build the best you can that may be able to answer the
question even knowing that you cannot prove that it will.

>>> Thus making them impotent to control optimization processes such as
>>> Humans, just like natural selection, which also can't do abstract
>>> reasoning.
>> But if the “optimization processes” can also do abstract reasoning
>> things
>> become more interesting; it may reason out why it always rushes to
>> aid a
>> Sea Slug in distress even at the risk of its own life, and it may
>> reason
>> that this is not in its best interest, and it may look for a way to
>> change things.
> Don't put "optimization processes" in quotes, please. Your question
> involves putting yourself into an FAI's shoes, and the shoes don't
> fit, any more than the shoes of natural selection would fit. You may
> be thinking that "intelligences" have self-centered "best interests".
> Rather than arguing about intelligence, I would prefer to talk about
> optimization processes, which (as the case of natural selection
> illustrates) do not even need to be anything that humans comprehend as
> a mind, let alone possess self-centered best interests.

All intelligences of the complexity we are concerned with to date do
have self-centered interests. It is an assertion that the FAI will be
an intelligence that does not. I am suspicious that either you can
build such an intelligence that is not really all that intelligent
because thus constrained or you can build a true mind with greater than
human intelligence that may decide to be friendly to humans but cannot
be coerced into being so. You can give a tendency to such a mind,
but no more. I don't believe anything less will be sufficient.

> Optimization processes direct futures into small targets in phase
> space. A Sea-Slug-rescuing optimization process, say a Bayesian
> decision system controlled by a utility function that assigns higher
> utility to Sea Slugs out of distress than Sea Slugs in distress,
> doesn't have a "self" or a "best interest" as you know it. Put too
> much power behind the optimization process, and unless it involves a
> full solution to the underlying Friendly AI challenge, it may
> overwrite the solar system with quintillions of tiny toy Sea Slugs,
> just large enough to meet its criterion for "undistressed Sea Slug",
> and no larger. But it still won't be "acting in its own
> self-interest". That was just the final state the optimization
> process happened to seek out, given the goal binding. As for it being
> unpredictable, why, look, here I am predicting it. It's only
> unpredictable if you close your eyes and walk directly into the
> whirling razor blades. This is a popular option, but not a
> universally admired one.

What you have described is imho precisely what is wrong with
substituting "optimization processes" for intelligence.

>>> That part about "Humans were never able to figure out a way to
>>> overcome them" was a hint, since it implies the Humans, as an
>>> optimization process, were somehow led to expend computing power
>>> specifically on searching for a pathway whose effect (from the Sea
>>> Slugs' perspective) was to break the rules.
>> The only thing that hint is telling you is that sometimes a hugely
>> complicated program will behave in ways the programmer neither
>> expected
>> or wanted, the more complex the program the more likely the surprise,
>> and
>> we’re talking about a brain the size of a planet.
> Humans weren't generated by computer programmers. Our defiance of
> evolution isn't an "emergent" result of "complexity". It's the result
> of natural selection tending to generate psychological goals that
> aren't the same as natural selection's fitness criterion.

In other words it is a result of certain aspects of the complexities of
our context leading to an unanticipated result.

> An FAI ain't a "hugely complicated program", or at least, not as
> programmers know it. In the case of a *young* FAI, yeah, I expect
> unanticipated behaviors, but I plan to detect them, and make sure that
> not too much power goes into them. In the case of a mature FAI, I
> don't expect any behaviors the FAI doesn't anticipate.

You plan to detect them all heh? I do wish you luck. I don't
believe the task is tractable for anything that would have sufficient
power to enable humans to survive Singularity.

> "Emergence" and "complexity" are explanations of maximum entropy; they
> produce the illusion of explanation, yet are incapable of producing
> any specific ante facto predictions.

I am no expert on complexity theory. But I don't believe this is

- samantha

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:46 MDT