Re: SIAI's flawed friendliness analysis

From: Eliezer S. Yudkowsky (
Date: Fri May 09 2003 - 22:14:27 MDT

Bill Hibbard wrote:
> This critique refers to the following documents:
> 1. The SIAI analysis fails to recognize the importance of
> the political process in creating safe AI.
> This is a fundamental error in the SIAI analysis. CFAI 4.2.1
> says "If an effort to get Congress to enforce any set of
> regulations were launched, I would expect the final set of
> regulations adopted to be completely unworkable." It further
> says that government regulation of AI is unnecessary because
> "The existing force tending to ensure Friendliness is that
> the most advanced projects will have the brightest AI
> researchers, who are most likely to be able to handle the
> problems of Friendly AI." History vividly teaches the danger
> of trusting the good intentions of individuals.

...and, of course, the good intentions and competence of governments.

There are few people who want to destroy the world on purpose. The
problem is people doing it by accident.

> The singularity will completely change power relations in
> human society. People and institutions that currently have
> great power and wealth will know this, and will try to
> manipulate the singularity to protect and enhance their
> positions. The public generally protects its own interests
> against the narrow interests of such powerful people and
> institutions via widespread political movements and the
> actions of democratically elected government. Such
> political action has never been more important than it
> will be in the singularity.

There is nothing to fight over.

The human species is fifty thousand years old. We've come a long way
during that time. We've come a long way during the last century. At the
start of the twentieth century, neither women nor blacks could vote in the
US. The twentienth century was, by the standards of today's world,
barbaric. Are we, ourselves, raving barbarians by any future perspective?
  Almost certainly. For one civilization to leave a permanent impress on
the next billion years is no more fair than for one individual to do so,
and just as deadly. Individuals grow in their moralities, as do
civilizations. That fundamental capability is what needs to be
transferred into a Friendly AI, not a morality frozen in time. A human is
capable of understanding the concept of moral improvement; a true FAI, a
real mind in the humane frame of reference, must be able to do the same,
or something vital is missing.

If you want to make an FAI that is capable of moral improvement,
constructed via a fair, species-representative method, then the only
question is whether you have the knowledge to do that - to build an AI
that fully understands the concept of moral failure and moral improvement
as well as we do ourselves, and that symbolizes a species rather than any
one individual or civilization.

If you commit to that standard, then there are no conflicts of interest to
fight over, no individually controllable variables that knowably correlate
with the outcome in a way that creates conflicts of interest.

What is dangerous is that someone who believes that "AIs just do what
they're told" will think that the big issue is who gets to tell AIs what
to do. Such people will not, of course, succeed in taking over the world;
I find it extremely implausible that any human expressing such a goal has
the depth of understanding required to build anything that is not a
thermostat AI. The problem is that these people might succeed in
destroying the world, given enough computing power to brute-force AI with
no real understanding of it.

Anyone fighting over what values the AI ought to have is simply fighting
over who gets to commit suicide. If you know how to give an AI any set of
values, you know how to give it a humanly representative set of values.
It is really not that hard to come up with fair strategies for any given
model of FAI. Coming up with an FAI model that works is very hard.

It is not a trivial thing, to create a mind that embodies the full human
understanding of morality. There is a high and beautiful sorcery to it.
I find it hard to believe that any human truly capable of learning and
understanding that art would use it to do something so small and mean.
And anyone else would be effectively committing suicide, whether they
realized it or not, because the AI would not hear what they thought they
had said.

> The reinforcement learning values of the largest (and hence
> most dangerous) AIs

Indeed the largest computers are the most dangerous, but not in the way
that you mean. They are dangerous because even people who don't
understand what they're doing may be able to brute-force AI given truly
insane amounts of computing power. Friendliness, of course, cannot be

> will be defined by the corporations and
> governments that build them, not the AI researchers working
> for those orgnaizations. Those organizations will give their
> AIs values that reflect the organizations' values: profits in
> the case of corporations, and political and military power
> in the case of governments.

Yes, this is a good example of what I mean by "small and mean". Anyone
trying to implement a goal system like that is a threat, not because they
could take over the world, but because they haven't mastered the
understanding necessary to avoid destroying the world.

> Only a strong public movement
> driving government regulation will be able to coerce these
> organizations to design AI values to protect the interests
> of all humans. This government regulation must include an
> agency to monitor AI development and enforce regulations.

In theory, I can imagine a set of AI development rules, perhaps handed
down from after the Singularity, so simple and obvious that even someone
who doesn't really understand intelligence could follow those rules and be
safe. Or such a thing could turn out to be impossible. It is certainly
beyond my present ability to write such a set of rules. Currently, there
is no set of guidelines I can write that will make an AI project safe if
the researcher does not have a sufficiently deep understanding to have
written the same guidelines from scratch.

> The breakthrough ideas for achieving AI will come from
> individual researchers, many of whom will want their AI to
> serve the broad human interest. But their breakthrough ideas
> will become known to wealthy organizations. Their research
> will either be in the public domain, done for hire by wealthy
> organizations, or will be sold to such organizations.

SIAI is a nonprofit, and I am not for sale at any price. Such are my own
choices, which are all that I control.

> Breakthrough research may simply be seized by governments and
> the researchers prohibited from publishing, as was done for
> research on effective cryptography during the 1970s. The most
> powerful AIs won't exist on the $5,000 computers on
> researchers' desktops, but on the $5,000,000,000 computers
> owned by wealthy organizations. The dangerous AIs will be the
> ones capable of developing close personal relations with huge
> numbers of people. Such AIs will be operated by wealthy
> organizations, not individuals.

Your political recommendations appear to be based on an extremely
different model of AI. Specifically:

1) "AIs" are just very powerful tools that amplify the short-term goals
of their users, like any other technology.

2) AIs have power proportional to the computing resources invested in
them, and everyone has access to pretty much the same theoretical model
and class of AI.

3) There is no seed AI, no rapid recursive self-improvement, no hard
takeoff, no "first" AI. AIs are just new forces in existing society,
coming into play a bit at a time, as everyone's AI technology improves at
roughly the same rate.

4) Anyone can make an AI that does anything. AI morality is an easy
problem with fully specifiable arbitrary solutions that are reliable and
humanly comprehensible.

5) Government workers can look at an AI design and tell what the AI's
morality does and whether it's safe.

6) There are variables whose different values correlate to socially
important differences in outcomes, such that government workers can
understand the variables and their correlation to the outcomes, and such
that society expects to have a conflict of interest with individuals or
organizations as to the values of those variables, with the value to
society of this conflict of interest exceeding the value to society of the
outcome differentials that depend on the greater competence of those
individuals or organizations. Otherwise there's nothing worth voting on.

I disagree with all six points, due to a different model of AI.

> Individuals working toward the singularity may resist
> regulation as interference with their research, as was
> evident in the SL4 discussion of testimony before
> Congressman Brad Sherman's committee. But such regulation
> will be necessary to coerce the wealthy organizations
> that will own the most powerful AIs. These will be much
> like the regulations that restrain powerful organizations
> from building dangerous products (cars, household
> chemicals, etc), polluting the environment, and abusing
> citizens.

Hm. I think all I can do here is point to Part III of LOGI and say that
my concern is with FOOMgoing AIs (AIs that go FOOM, as in a hard takeoff).
  Computer programs with major social effects, owned by powerful
organizations, that are *not* capable of rapid recursive self-improvement
and sparking a superintelligent transition, are not the kind of AI I worry
about. If there are governmentally understandable variables that
correlate to democratically disputed social outcomes, and so on, then I
might indeed write it off as ordinary politics.

> 2. The design recommendations in GUIDELINES 3 fail to
> define rigorous standards for "changes to supergoal
> content" in recommendation 3, for "valid" and "good" in
> recommendation 4, for "programmers' intentions" in
> recommendation 5, and for "mistaken" in recommendation 7.

Yes, I know. CFAI is confessedly incomplete.

To work, the theory of FAI is going to have to dig down to a point where
the theory is described *entirely* in terms of:

a) things that physically exist in external reality
b) incoming sensory information available to the AI
c) computations the AI knows how to perform

In short, what's needed is a naturalistic description of moral systems
building moral systems. I agree it's alarming that I don't have the full
specification of the entire pathway in hand at this instant, and I'm
working to remedy that. But you surely would not find it in a small set
of guidelines.

> These recommendations are about the AI learning its own
> supergoal. But even digging into corresponding sections
> of CFAI and FEATURES fails to find rigorus standards for
> defining critical terms in these recommendations.
> Determination of their meanings is left to "programmers"
> or the AI itself. Without rigorous standards for these
> terms, wealthy organizations constructing AIs will be
> free to define them in any way that serves their purposes
> and hence to construct AIs that serve their narrow
> interests rather than the general public interest.

The guidelines are not intended as a means of making AI programmers do
something against their will. I'll be astonished if I can get people to
understand the method with their wholehearted cooperation and willingness
to devote substantial amounts of time. I see little or no hope for people
who are vaguely interested and casually agreeable, unless they can be
transformed into the former class. Successfully *enforce* the creation of
Friendly AI on someone who is actually *opposed* to it? No way. Not a
chance. It's asking enough that this happen on purpose.

It's not conflicts of interest you need to worry about; any human who
deeply understands the Prisoner's Dilemna can find a way to cooperate in
almost any real-world circumstance, and you won't find someone capable of
creating FAI who doesn't deeply understand the Prisoner's Dilemna. The
scary part is someone getting the definitions *wrong*, as in "not doing
what they thought it would".

> 3. CFAI defines "friendliness" in a way that can only
> be determined by an AI after it has developed super-
> intelligence, and fails to define rigorous standards
> for the values that guide its learning until it reaches
> super-intelligence
> The actual definition of "friendliness" in CFAI 3.4.4
> requires the AI to know most humans sufficiently well
> to decompose their minds into "panhuman", "gaussian" and
> "personality" layers, and to "converge to normative
> altruism" based on collective content of the "panhuman"
> and "gaussian" layers. This will require the development
> of super-intelligence over a large amount of learning.
> The definition of friendliness values to reinforce that
> learning is left to "programmers". As in the previous
> point, this will allow wealthy organizations to define
> intial learning values for their AIs as they like.

I don't believe a young Friendly AI should be meddling in the real world
at all. If for some reason this becomes necessary, it might as well do
what the programmer says, maybe with its own humane veto. I'd trust a
programmer more than I'd trust an infant Friendly AI, because regardless
of its long-term purpose, during infancy the FAI is likely to have neither
a better approximation to humaneness, nor a better understanding of the
real world.

You are correct that an FAI theory is not finished until there is, in
hand, a specification of how the FAI is to be taught.

> 4. The CFAI analysis is based on a Bayesian reasoning
> model of intelligence, which is not a sufficient model
> for producing intelligence.
> While Bayesian reasoning has an important role in
> intelligence, it is not sufficient. Sensory experience
> and reinforcement learning are fundamental to
> intelligence. Just as symbols must be grounded in
> sensory experience, reasoning must be grounded in
> learning and emerges from it because of the need to
> solve the credit assignment problem, as discussed at:

Non-Bayesian? I don't think you're going to find much backing on this
one. If you've really discovered a non-Bayesian form of reasoning, write
it up and collect your everlasting fame. Personally I consider such a
thing almost exactly analogous to a perpetual motion machine. Except that
a perpetual motion machine is merely physically impossible, while
"non-Bayesian reasoning" appears to be mathematically impossible. Though
of course I could be wrong.

Reinforcement learning emerges from Bayesian reasoning, not the other way
around. Sensory experience likewise.

For more about Bayesian reasoning, see:

Reinforcement, specifically, emerges in a Bayesian decision system:

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT