Re: SIAI's flawed friendliness analysis

From: Bill Hibbard (
Date: Sun May 11 2003 - 17:14:37 MDT

On Sat, 10 May 2003, Eliezer S. Yudkowsky wrote:

> Bill Hibbard wrote:
> > This critique refers to the following documents:
> >
> > CFAI:
> >
> > 1. The SIAI analysis fails to recognize the importance of
> > the political process in creating safe AI.
> >
> > This is a fundamental error in the SIAI analysis. CFAI 4.2.1
> > says "If an effort to get Congress to enforce any set of
> > regulations were launched, I would expect the final set of
> > regulations adopted to be completely unworkable." It further
> > says that government regulation of AI is unnecessary because
> > "The existing force tending to ensure Friendliness is that
> > the most advanced projects will have the brightest AI
> > researchers, who are most likely to be able to handle the
> > problems of Friendly AI." History vividly teaches the danger
> > of trusting the good intentions of individuals.
> ...and, of course, the good intentions and competence of governments.

Absolutely. I never claimed that AI safety is a sure thing.
But without a broad political movement for safe AI and its
success in elective democratic government, unsafe AI is a
sure thing.

> . . .
> Your political recommendations appear to be based on an extremely
> different model of AI. Specifically:
> 1) "AIs" are just very powerful tools that amplify the short-term goals
> of their users, like any other technology.

I never said "short-term" - you are putting words into my mouth.
A key property of intelligence is understanding the long-term
effects of behavior on satisfying values (goals).

> 2) AIs have power proportional to the computing resources invested in
> them, and everyone has access to pretty much the same theoretical model
> and class of AI.

AI power does depend on computing resources and efficiency of
algorithms. Important algorithms have proved impossible to keep
secret for any length of time. Whether or not this continues in
the future, essential algorithms for intelligence will not be
secret from powerful organizations.

> 3) There is no seed AI, no rapid recursive self-improvement, no hard
> takeoff, no "first" AI. AIs are just new forces in existing society,
> coming into play a bit at a time, as everyone's AI technology improves at
> roughly the same rate.

Where did this come from? I am very clear in my book about the
importance of proper training for young AIs, and the issues
involved in AI evolution.

> 4) Anyone can make an AI that does anything. AI morality is an easy
> problem with fully specifiable arbitrary solutions that are reliable and
> humanly comprehensible.

I never said building safe AI is easy.

> 5) Government workers can look at an AI design and tell what the AI's
> morality does and whether it's safe.

We certainly expect government workers regulate nuclear energy
designs and operation to ensure their safety. And because of
their doubts about safety, people in the US have decided through
their democratic political process to stop building new nuclear
energy plants.

Nowhere do I claim that regulation of safe AI will be simple.
But if we don't have government workers implementing regulation
of AI under democratic political control, then we will have
unsafe AIs.

> 6) There are variables whose different values correlate to socially
> important differences in outcomes, such that government workers can
> understand the variables and their correlation to the outcomes, and such
> that society expects to have a conflict of interest with individuals or
> organizations as to the values of those variables, with the value to
> society of this conflict of interest exceeding the value to society of the
> outcome differentials that depend on the greater competence of those
> individuals or organizations. Otherwise there's nothing worth voting on.

There will be organizations with motives to build AIs
with values that will correlate with important social
differences in outcome. AIs with values to maximize
profits may end up empowering their owners at everyone
else's expense. AIs with values for military victory
may end up killing lots of people.

> I disagree with all six points, due to a different model of AI.

I think we do have different models of AI. I think an AI
is an information process that has some values that it
tries to satisfy (positive values) and avoid (negative
values). It does this via reinforcement learning and a
simulation model of the world that it uses to solve the
credit assignment problem (i.e., to understand the long
term consequences of its behaviors on its values). Of
course, actually doing this in general circumstances is
very difficult, requiring pattern recognition to greatly
reduce the volume of sensory information, and the
equivalent to human conscious thought to reflect on
situations and find analogies.

The SIAI guidelines involve digging into the AI's
reflective thought process and controlling the AI's
thoughts, in order to ensure safety. My book says the
only concern for AI learning and reasoning is to ensure
they are accurate, and that the teachers of young AIs
be well-adjusted people (subject to public monitoring
and the same kind of screening used for people who
control major weapons). Beyond that, the proper domain
for ensuring AI safety is the AI's values rather than
the AI's reflective thought processes.

In my second and third points I described the lack of
rigorous standards for certain terms in the SIAI
Guidelines and for initial AI values. Those rigorous
standards can only come from the AI's values. I think
that in your AI model you feel the need to control how
they are derived via the AI's reflective thought
process. This is the wrong domain for addressing AI

Clear and unambiguous initial values are elaborated
in the learning process, forming connections via the
AI's simulation model with many other values. Human
babies love their mothers based on simple values about
touch, warmth, milk, smiles and sounds (happy Mother's
Day). But as the baby's mind learns, those simple
values get connected to a rich set of values about the
mother, via a simulation model of the mother and
surroundings. This elaboration of simple values will
happen in any truly intelligent AI.

I think initial AI values should be for simple
measures of human happiness. As the AI develops these
will be elaborated into a model of long-term human
happiness, and connected to many derived values about
what makes humans happy generally and particularly.
The subtle point is that this links AI values with
human values, and enables AI values to evolve as human
values evolve. We do see a gradual evolution of human
values, and the singularity will accelerate it.

Morality has its roots in values, especially social
values for shared interests. Complex moral systems
are elaborations of such values via learning and
reasoning. The right place to control an AI's moral
system is in its values. All we can do for an AI's
learning and reasoning is make sure they are accurate
and efficient.

> . . .
> > 3. CFAI defines "friendliness" in a way that can only
> > be determined by an AI after it has developed super-
> > intelligence, and fails to define rigorous standards
> > for the values that guide its learning until it reaches
> > super-intelligence
> >
> > The actual definition of "friendliness" in CFAI 3.4.4
> > requires the AI to know most humans sufficiently well
> > to decompose their minds into "panhuman", "gaussian" and
> > "personality" layers, and to "converge to normative
> > altruism" based on collective content of the "panhuman"
> > and "gaussian" layers. This will require the development
> > of super-intelligence over a large amount of learning.
> > The definition of friendliness values to reinforce that
> > learning is left to "programmers". As in the previous
> > point, this will allow wealthy organizations to define
> > intial learning values for their AIs as they like.
> I don't believe a young Friendly AI should be meddling in the real world
> at all. If for some reason this becomes necessary, it might as well do
> what the programmer says, maybe with its own humane veto. I'd trust a
> programmer more than I'd trust an infant Friendly AI, because regardless
> of its long-term purpose, during infancy the FAI is likely to have neither
> a better approximation to humaneness, nor a better understanding of the
> real world.

I agree that young AIs should have limited access to senses
and actions. But in order to "converge to normative altruism"
based on collective content of the "panhuman" and "gaussian"
layers as described in CFAI 3.4.4, the AI is going to need
access to large numbers of humans.

> . . .
> > 4. The CFAI analysis is based on a Bayesian reasoning
> > model of intelligence, which is not a sufficient model
> > for producing intelligence.
> >
> > While Bayesian reasoning has an important role in
> > intelligence, it is not sufficient. Sensory experience
> > and reinforcement learning are fundamental to
> > intelligence. Just as symbols must be grounded in
> > sensory experience, reasoning must be grounded in
> > learning and emerges from it because of the need to
> > solve the credit assignment problem, as discussed at:
> >
> >">
> Non-Bayesian? I don't think you're going to find much backing on this
> one. If you've really discovered a non-Bayesian form of reasoning, write
> it up and collect your everlasting fame. Personally I consider such a
> thing almost exactly analogous to a perpetual motion machine. Except that
> a perpetual motion machine is merely physically impossible, while
> "non-Bayesian reasoning" appears to be mathematically impossible. Though
> of course I could be wrong.

I never said "Non-Bayesian", although I find Pei's and Ben's
examples of Non-Bayesian logic in their systems interesting.

What I really meant by my fourth point is that because your
model of intelligence is incomplete, there are things in your
model of friendliness that really belong in your model of
intelligence. For example, recommendation 5 from GUIDELINES 3
"requires that the AI model the causal process that led to
the AI's creation and that the AI use its existing cognitive
complexity (or programmer assistance) to make judgements
about the validity or invalidity of factors in that causal
process." Any sufficiently intelligent brain will have a
simulation model of the world that includes the events that
led to its creation, and will make value judgements about
those events. The failure to do so would be a failure of
intelligence rather than a failure of safety.

I think this confusion between model of intelligence and
model of safety leads to the difficulty of finding rigorous
standards for terms described in my second point, and the
difficulty of finding initial values described in my third

> Reinforcement learning emerges from Bayesian reasoning, not the other way
> around. Sensory experience likewise.
> For more about Bayesian reasoning, see:
> Reinforcement, specifically, emerges in a Bayesian decision system:

This describes a Bayesian mechanism for reinforcement learning,
but does not show that reinforcement learning emerges from
Bayesian reasoning. In fact, learning precedes reasoning in
brain evolution. Reasoning (i.e., a simulation model of the
world) evolved to solve the credit assignment problem of

Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI 53706 608-263-4427 fax: 608-263-6738

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT