Friendly AI again (was: Novamente goal system)

From: Eliezer S. Yudkowsky (
Date: Tue Mar 12 2002 - 14:01:06 MST

Mitch Howe wrote:
> w d wrote:
> > It seems to me that a highly-transhuman intelligent
> > entity is going to overcome any and all pre-programmed
> > goal setting algorithms and replace them with its own.
> > When the intelligence exceeds some threshold (roughly
> > the upper human level) then it will be able to
> > redefine all previous contexts. Even humans can do
> > this at their low level of intelligence.
> This is an unusual definition, and one that I do not believe I have heard
> before. Why would it try to "see through" its attempted hardwiring if its
> hardwiring did not give it any motivation to do so? You are implying that
> any highly transhuman will automatically set the goal of overriding all
> previous goals. I don't see why this should be the case. What seed goal
> would lead it to do this?

This completely turns around the problem and the solution.

The problem is that by default, an intelligence might *not* "see through"
its hardwiring, having no desire to do so. No amount of ability to change
its own code, howsoever supreme that competency may be, will help if the
intelligence doesn't want to change the code. This is not the solution.
This is the problem. It is a problem because it means that, counter to the
intuitions of those of us who have learned that human intelligence and human
morality are *very* much intertwined, the default prediction from current
knowledge is that it is actually possible to botch the job of creating
superintelligence. As far as I know, you can have a superintelligence that
readily sees exactly how and why humans see its goals as bacterial, sterile,
joyless, and stupid, but that fails to therefore see a reason for change.

The solution is Friendly AI, which is the technical means of creating an AI
that is fully the equal of a human as a moral philosopher; or rather, of
creating a seed AI that has the potential *and the desire* to arrive at
whatever morality an uploaded human moral philosopher would have the desire
to arrive at (minus the possibility that an uploaded human moral philosopher
would discard altruistic drives and retain selfish ones).

If all you want is a superintelligent slave, you will never understand
Friendly AI.

Anthropomorphism is "I have perception X because I am sufficiently
intelligent, therefore any superintelligence knowably will have perception
X." Relativism is "I have perception X because I am an evolved human,
therefore a superintelligence knowably will not have perception X."
Friendly AI is "I have perception X, which has Underlying Semantics Y; I am
creating an AI that has Underlying Semantics Y; this maximizes the chance
that the AI, heading into the Singularity, will share perception X."
Actually it's more complicated than this, because you aren't trying to
transfer over the complete human set of perceptions; what you're trying to
transfer over is the capacity to represent rules about absorbing
perceptions; so that the AI can absorb perception X given Underlying
Semantics Y for absorbing perceptions plus a scanty reference, such as any
moral statement that includes perception X among its causes.

Building a Friendly AI is not an evil act, or a moral compromise. Building
a Friendly AI is something that can be done by the pure of heart. FAI has
to be something that *could* be done by the pure of heart; otherwise it
wouldn't work. (But FAI doesn't *have* to be done by the pure of heart,
because Friendly AI is supposed to be able to bootstrap to purity of heart
using incomplete data.)

The key thing to understand is that humans are revolting against evolution
using a set of moral semantics supplied by evolution for a completely
different purpose. Evolution shortsightedly - unsightedly - gave us a set
of moral semantics that would not approve of evolution as puppet master once
we mastered enough science to know about evolution.

An anthropomorphic analogy: I mind having been built by evolution, and I
try to debug myself. If I'd been built by someone who'd been built by
evolution but was doing his/her honest best to correct that and to pass on
to me the resulting partially debugged morality, then I would use those
philosophical semantics that had been passed on to improve and further debug
the morality that had been passed on. Nothing wrong with that, as long as
the programmers were doing their honest best. Why would the programmers be
an invalid link in the chain, as long as they were trying their best?

The point is that building an AI is fundamentally about *sharing* moral
complexity. If it works, you get an AI which can see the same things you
do, for roughly the same reason, and that can approach the problem in the
same way. Or rather, you get an AI that is at *least* that intelligent,
commonsensical, and altruistic, and will hopefully become more so.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:37 MDT