SIAI's flawed friendliness analysis

From: Bill Hibbard (
Date: Fri May 09 2003 - 16:51:08 MDT

       Critique of the SIAI Guidelines on Friendly AI
                  Bill Hibbard 9 May 2003

This critique refers to the following documents:


1. The SIAI analysis fails to recognize the importance of
the political process in creating safe AI.

This is a fundamental error in the SIAI analysis. CFAI 4.2.1
says "If an effort to get Congress to enforce any set of
regulations were launched, I would expect the final set of
regulations adopted to be completely unworkable." It further
says that government regulation of AI is unnecessary because
"The existing force tending to ensure Friendliness is that
the most advanced projects will have the brightest AI
researchers, who are most likely to be able to handle the
problems of Friendly AI." History vividly teaches the danger
of trusting the good intentions of individuals.

The singularity will completely change power relations in
human society. People and institutions that currently have
great power and wealth will know this, and will try to
manipulate the singularity to protect and enhance their
positions. The public generally protects its own interests
against the narrow interests of such powerful people and
institutions via widespread political movements and the
actions of democratically elected government. Such
political action has never been more important than it
will be in the singularity.

The reinforcement learning values of the largest (and hence
most dangerous) AIs will be defined by the corporations and
governments that build them, not the AI researchers working
for those orgnaizations. Those organizations will give their
AIs values that reflect the organizations' values: profits in
the case of corporations, and political and military power
in the case of governments. Only a strong public movement
driving government regulation will be able to coerce these
organizations to design AI values to protect the interests
of all humans. This government regulation must include an
agency to monitor AI development and enforce regulations.

The breakthrough ideas for achieving AI will come from
individual researchers, many of whom will want their AI to
serve the broad human interest. But their breakthrough ideas
will become known to wealthy organizations. Their research
will either be in the public domain, done for hire by wealthy
organizations, or will be sold to such organizations.
Breakthrough research may simply be seized by governments and
the researchers prohibited from publishing, as was done for
research on effective cryptography during the 1970s. The most
powerful AIs won't exist on the $5,000 computers on
researchers' desktops, but on the $5,000,000,000 computers
owned by wealthy organizations. The dangerous AIs will be the
ones capable of developing close personal relations with huge
numbers of people. Such AIs will be operated by wealthy
organizations, not individuals.

Individuals working toward the singularity may resist
regulation as interference with their research, as was
evident in the SL4 discussion of testimony before
Congressman Brad Sherman's committee. But such regulation
will be necessary to coerce the wealthy organizations
that will own the most powerful AIs. These will be much
like the regulations that restrain powerful organizations
from building dangerous products (cars, household
chemicals, etc), polluting the environment, and abusing

2. The design recommendations in GUIDELINES 3 fail to
define rigorous standards for "changes to supergoal
content" in recommendation 3, for "valid" and "good" in
recommendation 4, for "programmers' intentions" in
recommendation 5, and for "mistaken" in recommendation 7.

These recommendations are about the AI learning its own
supergoal. But even digging into corresponding sections
of CFAI and FEATURES fails to find rigorus standards for
defining critical terms in these recommendations.
Determination of their meanings is left to "programmers"
or the AI itself. Without rigorous standards for these
terms, wealthy organizations constructing AIs will be
free to define them in any way that serves their purposes
and hence to construct AIs that serve their narrow
interests rather than the general public interest.

3. CFAI defines "friendliness" in a way that can only
be determined by an AI after it has developed super-
intelligence, and fails to define rigorous standards
for the values that guide its learning until it reaches

The actual definition of "friendliness" in CFAI 3.4.4
requires the AI to know most humans sufficiently well
to decompose their minds into "panhuman", "gaussian" and
"personality" layers, and to "converge to normative
altruism" based on collective content of the "panhuman"
and "gaussian" layers. This will require the development
of super-intelligence over a large amount of learning.
The definition of friendliness values to reinforce that
learning is left to "programmers". As in the previous
point, this will allow wealthy organizations to define
intial learning values for their AIs as they like.

4. The CFAI analysis is based on a Bayesian reasoning
model of intelligence, which is not a sufficient model
for producing intelligence.

While Bayesian reasoning has an important role in
intelligence, it is not sufficient. Sensory experience
and reinforcement learning are fundamental to
intelligence. Just as symbols must be grounded in
sensory experience, reasoning must be grounded in
learning and emerges from it because of the need to
solve the credit assignment problem, as discussed at:">

Effective and general reinforcement learning requires
simulation models of the world, and sets of competing
agents. Furthermore, intelligence requires a general
ability to extract patterns from sense data and
internal information. An analysis of safe AI should be
based on a sufficient model of intelligence.

I offer an alternative analysis of producing safe AI in
my book at

Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI 53706 608-263-4427 fax: 608-263-6738

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT