Re: SIAI's flawed friendliness analysis

From: Durant Schoon (
Date: Wed May 14 2003 - 18:18:39 MDT

Hi Bill,


> The SIAI guidelines involve digging into the AI's
> reflective thought process and controlling the AI's
> thoughts, in order to ensure safety. My book says the
> only concern for AI learning and reasoning is to ensure
> they are accurate, and that the teachers of young AIs
> be well-adjusted people (subject to public monitoring
> and the same kind of screening used for people who
> control major weapons). Beyond that, the proper domain
> for ensuring AI safety is the AI's values rather than
> the AI's reflective thought processes.

Two words: "Value Hackers"

Let us try to understand why Eliezer chooses to focus on an "AI's
reflective thought processes" rather than on explicitly specifying an
"AI's values". Let's look at it this way: if you could, wouldn't you
rather develop an AI which could reason about *why* the values are the
way they are instead of just having the values carved in stone by a

This is *safer* for one very important reason:

The values are less likely corruptible, since the AI that actually
understands the sources for these values can reconstruct them from
basic principles/information-about-humans-and-the-world-in-general.

The ability to be able to re-derive these values in the face of a
changing external environment and an interior mindscape-under-
development is in fact *paramount* to the preservation of Friendliness.

As mentioned before, this all hinges on the the ability to create an
AI in the first place that can understand "how and why values are
created" as well as what humans are, what itself is, and what the
world around us is. Furthemore, we are instructed by this insight
as to what the design of an AI should look like.

Remembering our goal is to build better brains, the whole notion of
the SeedAI bootstrap is to get a AI that builds a better AI. We must
then ask ourselves this question: Who should be in charge of
designating this new entity's values? The answer is "the smartest most
capable thinker who is the most skilled in these areas". At some point
that thinker is the AI. From the get-go we want the AI to be competent
at this task. If we cannot come up with a way to ensure this, we
should not attempt to build a mind in the first place (this is
Eliezer's view. I happen to agree with this. Ben Goertzel & Peter
Voss's opposing views have been noted previously as well(*)).

In summary, we need to build the scaffolding of *deep* understanding
of values and their derivations into the design of AI if we are to
have a chance at all. The movie 2001 is already an example in popular
mind of what can happen when this is not done.

We cannot think of everything in advance. We must build a mind that
does the "right" thing no matter what happens.

Considering your suggestion that the AI *only* be concerned with
having accurate thoughts: I haven't read your book, so I don't know
your reasoning for this. I can imagine that it's an *easier* way to do
things. You don't have to worry about the hard problem of where values
come from, which ones are important and how to preserve the right ones
under self modification. Easier is not better, obviously, but I'm only
guessing here why you might hold "ensuring-accuracy" in higher esteem
than other goals of thinking (like preserving Friendliness). 
(*) This may be the familiar topic on this list of "when" to devote
your efforts to Friendliness, not "if". This topic has already been
discussed exhaustively and I would say it comes down to how one
answers certain questions: "How cautious do you want to be?", "How
seriously do consider that a danger could arise quickly, without a
chance to correct problems on a human time scale of thinking/action".
> In my second and third points I described the lack of
> rigorous standards for certain terms in the SIAI
> Guidelines and for initial AI values. Those rigorous
> standards can only come from the AI's values. I think
> that in your AI model you feel the need to control how
> they are derived via the AI's reflective thought
> process. This is the wrong domain for addressing AI
> safety.
Just to reiterate, with SeedAI, the AI becomes the programmer, the
gatekeeper of modifications. We *want* the modifier of the AI's values
to be super intelligent, better than all humans at that task, to be
more trustworthy, to do the right thing better than any
best-intentioned human. Admiteddly, this is a tall order, an order
Eliezer is trying to fill. 
If you are worried about rigorous standards, perhaps Elizer's proposal
of Wisdom Tournaments would address your concern. Before the first
line of code is ever written, I'm expecting Eliezer to expound upon
these points in sufficient detail.
> Clear and unambiguous initial values are elaborated
> in the learning process, forming connections via the
> AI's simulation model with many other values. Human
> babies love their mothers based on simple values about
> touch, warmth, milk, smiles and sounds (happy Mother's
> Day). But as the baby's mind learns, those simple
> values get connected to a rich set of values about the
> mother, via a simulation model of the mother and
> surroundings. This elaboration of simple values will
> happen in any truly intelligent AI.
Why will this elaboration happen? In other words, if you have a
design, it should not only convince us that the elaboration will
occur, but that it will be done in the right way and for the right
reasons. Compromising any one of those could have disastrous effects
for everyone.
> I think initial AI values should be for simple
> measures of human happiness. As the AI develops these
> will be elaborated into a model of long-term human
> happiness, and connected to many derived values about
> what makes humans happy generally and particularly.
> The subtle point is that this links AI values with
> human values, and enables AI values to evolve as human
> values evolve. We do see a gradual evolution of human
> values, and the singularity will accelerate it.
I think you have good intentions. I appreciate your concern for doing
the right thing and helping us all along on our individual goals to be
happy(**), but if the letter-of-the-law is upheld and there is
no-true-comprehension as to *why* the law is the way it is, we could
all end up invaded by nano-serotonin-reuptake-inhibiting-bliss-bots.
I think Eliezer takes this great idea you mention, of guiding the AI
to have human values and evolve with human values, one step
further. Not only does he propose that the AI have these human(e)
values, but he insists that the AI know *why* these values are good
ones, what good "human" values look like, and how to extrapolate them
properly (in the way that the smartest, most ethical human would),
should the need arise.
Additionally, we must consider the worst case, that we cannot control
rapid ascent when it occurs. In that scenario we want the the AI to be
ver own guide, maintaining and extending-as-necessary ver morality
under rapid, heavy mental expansion/reconfiguration. Should we reach
that point, the situation will be out of human hands. No regulatory
guidelines will be able to help us. Everything we know and cherish
could depend on preparing for that possible instant.
(**) Slight irony implied but full, sincere appreciation bestowed. I
consider this slightly ironic, since I view happiness as a signal that
confirms a goal was achieved rather than a goal, in-and-of-itself.
> Morality has its roots in values, especially social
> values for shared interests. Complex moral systems
> are elaborations of such values via learning and
> reasoning. The right place to control an AI's moral
> system is in its values. All we can do for an AI's
> learning and reasoning is make sure they are accurate
> and efficient.
I'll agree that the values are critical linchpins as you suggest, but
please do not lose sight of the fact that these linchpins are part of
a greater machine with many interdependencies and exposure to an
external, possibly malevolent, world. 
The statement: "All we can do for an AI's learning and reasoning is
make sure they are accurate and efficient" seems limiting to me, in
the light Eliezer's writings. If we can construct a mind that will
solve this most difficult of problems (extreme intellectual ascent
while preserving Friendliness) for us and forever, then we should aim
for nothing less. Indeed, not hitting this mark is a danger that
people on this list take quite seriously.
Thank you for your interest and for joining the discussion.
Durant Schoon

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT