Re: SIAI's flawed friendliness analysis

From: Bill Hibbard (
Date: Fri May 16 2003 - 20:35:15 MDT

Hi Durant,

> > The SIAI guidelines involve digging into the AI's
> > reflective thought process and controlling the AI's
> > thoughts, in order to ensure safety. My book says the
> > only concern for AI learning and reasoning is to ensure
> > they are accurate, and that the teachers of young AIs
> > be well-adjusted people (subject to public monitoring
> > and the same kind of screening used for people who
> > control major weapons). Beyond that, the proper domain
> > for ensuring AI safety is the AI's values rather than
> > the AI's reflective thought processes.
> Two words: "Value Hackers"
> Let us try to understand why Eliezer chooses to focus on an "AI's
> reflective thought processes" rather than on explicitly specifying an
> "AI's values". Let's look at it this way: if you could, wouldn't you
> rather develop an AI which could reason about *why* the values are the
> way they are instead of just having the values carved in stone by a
> programmer.
> This is *safer* for one very important reason:
> The values are less likely corruptible, since the AI that actually
> understands the sources for these values can reconstruct them from
> basic principles/information-about-humans-and-the-world-in-general.
> The ability to be able to re-derive these values in the face of a
> changing external environment and an interior mindscape-under-
> development is in fact *paramount* to the preservation of Friendliness.

This sounds good, but it won't work because values can
only be derived from other values. That's why the SIAI
recommendations use value words like "valid", "good"
and "mistaken". But the recommendations leave the
definition of these words ambiguous. Furthermore,
recommendation 1, "Friendliness-topped goal system",
defines a base value (i.e., supergoal) but leaves it
ambiguous. Defining rigorous standards for these words
will require the SIAI recommendations to precisely
define values.

The ambiguous definitions in the SIAI analysis will be
exploited by powerful people and institutions to create
AIs that protect and enhance their own interests.

> As mentioned before, this all hinges on the the ability to create an
> AI in the first place that can understand "how and why values are
> created" as well as what humans are, what itself is, and what the
> world around us is. Furthemore, we are instructed by this insight
> as to what the design of an AI should look like.

Understanding what humans are, what the AI itself is, and
what values are, is implicit in the simulation model of the
world created by any intelligent mind. It uses this model to
predict the long-term consequences of its behaviors on its
values. Such understanding is prescribed by an intelligence
model rather than a safe AI model.

> Remembering our goal is to build better brains, the whole notion of
> the SeedAI bootstrap is to get a AI that builds a better AI. We must
> then ask ourselves this question: Who should be in charge of
> designating this new entity's values? The answer is "the smartest most
> capable thinker who is the most skilled in these areas". At some point
> that thinker is the AI. From the get-go we want the AI to be competent
> at this task. If we cannot come up with a way to ensure this, we
> should not attempt to build a mind in the first place (this is
> Eliezer's view. I happen to agree with this. Ben Goertzel & Peter
> Voss's opposing views have been noted previously as well(*)).

It is true that AIs, motivated by their values, will design
and build new and better AIs. Based on the first AI's
understanding about how to acheive its own values, it may
design new AIs with slightly different values. As Ben says,
there can be no absolute guarantee that these drifting values
will always be in human interests. But I think that AI values
for human happiness link AI values with human values and
create a symbiotic system combining humans and AIs. Keeping
humans "in the loop" offers the best hope of preventing any
drift away from human interests. An AI with humans in its
loop will have no motive to design an AI without humans in
its loop. And as long as humans are in the loop, they can
exert reinforcement to protect their own interests.

> In summary, we need to build the scaffolding of *deep* understanding
> of values and their derivations into the design of AI if we are to
> have a chance at all. The movie 2001 is already an example in popular
> mind of what can happen when this is not done.

The "*deep* understanding of values" is implicit in superior
intelligence. It is a very accurate simulation model of the
world that includes understanding of how value systems work,
and the effects that different value systems would have on
brains. But choosing between different value systems requires
base values for comparing the consequences of different value

Reinforcement learning and reinforcement values are essential
to intelligence. Every AI will have base values that are part
of its defintion, and will use them in any choice between
different value systems.

Hal in the movie 2001 values its own life higher than the
lives of its human companions, with results predictably bad
for humans. Its the most basic observation about safe AI.

> We cannot think of everything in advance. We must build a mind that
> does the "right" thing no matter what happens.

The right place to control the behavior of an AI is its

There's a great quote from Galileo: "I do not feel obliged to
believe that the same God who endowed us with sense, reason
and intellect has intended us to forgo their use." When we
endow an artifact with an intellect, we will not be able to
control its thoughts. We can only control the accuracy of its
world model, by the quality and quantity of brain power we
give it, and its base values. The thoughts of an AI will be
determined by its experience with the world, and its values.

> ---
> Considering your suggestion that the AI *only* be concerned with
> having accurate thoughts: I haven't read your book, so I don't know
> your reasoning for this. I can imagine that it's an *easier* way to do
> things. You don't have to worry about the hard problem of where values
> come from, which ones are important and how to preserve the right ones
> under self modification. Easier is not better, obviously, but I'm only
> guessing here why you might hold "ensuring-accuracy" in higher esteem
> than other goals of thinking (like preserving Friendliness).

Speculating about my motives is certainly an *easier* way for
you to argue against my ideas.

An accurate simulation model of the world is necessary for
predicting and evaluating (according to reinforcement values)
the long-term consequences of behaviors. This is integral to
preserving friendliness, rather than an alternative.

When you say "values ..., which ones are important and how to
preserve the right ones" you are making value judgements about
values. Those value judgements must be based on some set of
base values.

> (*) This may be the familiar topic on this list of "when" to devote
> your efforts to Friendliness, not "if". This topic has already been
> discussed exhaustively and I would say it comes down to how one
> answers certain questions: "How cautious do you want to be?", "How
> seriously do consider that a danger could arise quickly, without a
> chance to correct problems on a human time scale of thinking/action".
> > In my second and third points I described the lack of
> > rigorous standards for certain terms in the SIAI
> > Guidelines and for initial AI values. Those rigorous
> > standards can only come from the AI's values. I think
> > that in your AI model you feel the need to control how
> > they are derived via the AI's reflective thought
> > process. This is the wrong domain for addressing AI
> > safety.
> Just to reiterate, with SeedAI, the AI becomes the programmer, the
> gatekeeper of modifications. We *want* the modifier of the AI's values
> to be super intelligent, better than all humans at that task, to be
> more trustworthy, to do the right thing better than any
> best-intentioned human. Admiteddly, this is a tall order, an order
> Eliezer is trying to fill.
> If you are worried about rigorous standards, perhaps Elizer's proposal
> of Wisdom Tournaments would address your concern. Before the first
> line of code is ever written, I'm expecting Eliezer to expound upon
> these points in sufficient detail.

Wisdom Tournaments will only address my concerns if they
define precise values for a safe AI, and create a political
movement to enforce those values.

> > Clear and unambiguous initial values are elaborated
> > in the learning process, forming connections via the
> > AI's simulation model with many other values. Human
> > babies love their mothers based on simple values about
> > touch, warmth, milk, smiles and sounds (happy Mother's
> > Day). But as the baby's mind learns, those simple
> > values get connected to a rich set of values about the
> > mother, via a simulation model of the mother and
> > surroundings. This elaboration of simple values will
> > happen in any truly intelligent AI.
> Why will this elaboration happen? In other words, if you have a
> design, it should not only convince us that the elaboration will
> occur, but that it will be done in the right way and for the right
> reasons. Compromising any one of those could have disastrous effects
> for everyone.

The elaboration is the same as the creation of subgoals
in the SIAI analysis. When you say "right reasons" you
are making a value judgement about values, that can only
come from base values in the AI.

> > I think initial AI values should be for simple
> > measures of human happiness. As the AI develops these
> > will be elaborated into a model of long-term human
> > happiness, and connected to many derived values about
> > what makes humans happy generally and particularly.
> > The subtle point is that this links AI values with
> > human values, and enables AI values to evolve as human
> > values evolve. We do see a gradual evolution of human
> > values, and the singularity will accelerate it.
> I think you have good intentions. I appreciate your concern for doing
> the right thing and helping us all along on our individual goals to be
> happy(**), but if the letter-of-the-law is upheld and there is
> no-true-comprehension as to *why* the law is the way it is, we could
> all end up invaded by nano-serotonin-reuptake-inhibiting-bliss-bots.

This illustrates the difference between short-term and
long-term. Brain chemicals give short-term happiness
but not long-term. An intelligent AI will learn that
drugs do not make people happy in the long term.

This is where a simulation model fo the world is
important, to predict and evaluate the long-term
consequences of behaviors (like giving humans
sophisticated drugs).

> I think Eliezer takes this great idea you mention, of guiding the AI
> to have human values and evolve with human values, one step
> further. Not only does he propose that the AI have these human(e)
> values, but he insists that the AI know *why* these values are good
> ones, what good "human" values look like, and how to extrapolate them
> properly (in the way that the smartest, most ethical human would),
> should the need arise.

Understanding all the implications of various values is
a function of reason (i.e., simulating the world). I'm all
in favor of that, but it is a direct consequence of superior
intelligence. It is part of an intelligence model rather
than a friendliness model.

But for "the AI (to) know *why* these values are good ones" is
to make value judgements about values. Such value judgements
imply some base values for judging candidate values. This
is my point: the proper domain of AI friendliness theory
is values.

> Additionally, we must consider the worst case, that we cannot control
> rapid ascent when it occurs. In that scenario we want the the AI to be
> ver own guide, maintaining and extending-as-necessary ver morality
> under rapid, heavy mental expansion/reconfiguration. Should we reach
> that point, the situation will be out of human hands. No regulatory
> guidelines will be able to help us. Everything we know and cherish
> could depend on preparing for that possible instant.
> (**) Slight irony implied but full, sincere appreciation bestowed. I
> consider this slightly ironic, since I view happiness as a signal that
> confirms a goal was achieved rather than a goal, in-and-of-itself.

Yes, but happiness is playing these two different roles in
two different brains. It is the signal of positive values in
humans, and observation of that signal is the positive value
for the AI. This links AI values to human values.

> > Morality has its roots in values, especially social
> > values for shared interests. Complex moral systems
> > are elaborations of such values via learning and
> > reasoning. The right place to control an AI's moral
> > system is in its values. All we can do for an AI's
> > learning and reasoning is make sure they are accurate
> > and efficient.
> I'll agree that the values are critical linchpins as you suggest, but
> please do not lose sight of the fact that these linchpins are part of
> a greater machine with many interdependencies and exposure to an
> external, possibly malevolent, world.

There are both benevolence and malevolence in the world.
For further details, see ;)

> The statement: "All we can do for an AI's learning and reasoning is
> make sure they are accurate and efficient" seems limiting to me, in
> the light Eliezer's writings. If we can construct a mind that will
> solve this most difficult of problems (extreme intellectual ascent
> while preserving Friendliness) for us and forever, then we should aim
> for nothing less. Indeed, not hitting this mark is a danger that
> people on this list take quite seriously.

Unsafe AI is a great danger, but I think we disagree on
the greatest source of that danger. The SIAI analysis is
so afraid of making an error in the definition of its
base values (i.e., supergoal) that it leaves them

This ambiguity will be exploited by the main danger for
safe AI, namely powerful people and institutions who will
try to manipulate the singularity for protect and enhance
their own interests.

The SIAI analysis completely fails to recognize the need
for politics to counter the threat of unsafe AI posed by
the bad intentions of some people.

That threat must be opposed by clearly defined values for
safe AIs, and a broad political movement successful in
electoral politics to enforce those values.

Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI 53706 608-263-4427 fax: 608-263-6738

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT