Re: Military Friendly AI

From: Eliezer S. Yudkowsky (
Date: Thu Jun 27 2002 - 16:44:31 MDT

Samantha Atkins wrote:
> Eliezer S. Yudkowsky wrote:
>> As I understand "rationality", association, intuition,
>> pattern-recognition, et cetera, are extensions of rationality just as
>> much as verbal logic. If our culture thinks otherwise it's because
>> humans have accumulated more irrationality-correctors in verbal
>> declarative form than intuitive form and hence associate rationality
>> with logic. From a mind-in-general's perspective these are different
>> forms of rational intelligence, not rational and irrational
>> intelligence. Anyway...
> I don't think it is cultural. Or perhaps the problem is that "rational"
> and "irrational" are too fuzzy to be of much use in clarifying the
> points. In any case I don't see why an SI would be any less inclined to
> have its choices of possible solution paths be influenced by previous
> taken paths that had "worked" in other similar contexts than we are.

I'm not sure what you mean by this. Why would a grownup AI trust unreliable
past information until it had been verified? Why would a grownup AI trust
to the value a skill routine that fulfilled a subgoal of whose real value it
had become uncertain? Calling the validity of a cause into question should
almost always call into question the validity of that cause's effects.
Given the limited complexity of its infancy I would expect an SI to be able
to model this part of its past pretty much in toto. I just don't see a bias
like that as persisting once the AI knows enough to contradict the original
cause. This isn't a military thing. This isn't even a Friendly AI thing!
Washing the biases out is a *very* general part of growing up.

>> An AI can learn the programmer's mistakes in verbal form, associational
>> form, recognized patterns, et cetera. The critical issue is whether,
>> when the AI grows up, the AI will be able to correct those mistakes.
> So, you are expecting it to decide that killing people was a "mistake"
> and drop it from future problem solving?

Assuming it *was* a mistake, then yes.

> It might or might not kill its own programmers. The danger is whether it
> considers killing itself to be a long-term viable way of dealing with
> problems. If it carries this beyond the point where humans can influence
> its programming we have a problem as far as I can see.

We do indeed have a problem. The problem is not "a violent SI". The
problem is "an SI that can't correct moral errors made by its programmers"
which is by far more dangerous.

>> Despite an immense amount of science fiction dealing with this topic, I
>> honestly don't think that an *infrahuman* AI erroneously deciding to
>> solve problems by killing people is all that much of a risk, both in
>> terms of the stakes being relatively low, and in terms of it really not
>> being all that likely to happen as a cognitive error. Because of its
>> plot value, it happens much more often in science fiction than it would
>> in reality. (You have been trained to associate to this error as a
>> perceived possibility at a much higher rate than its probable
>> real-world incidence.) I suppose if you had a really bad disagreement
>> with a working combat AI you might be in substantially more trouble
>> than if you had a disagreement with a seed AI in a basement lab, but
>> that's at the infrahuman level - meaning, not Singularity-serious. A
>> disagreement with a transhuman AI is pretty much equally serious
>> whether the AI is in direct command of a tank unit or sealed in a lab
>> on the Moon; intelligence is what counts.
> Well gee, that is a great relief! But you haven't really convincingly
> stated why this is not a possibility. If the combat AI is capable of
> evolving (is a Seed) then we have a problem, no? The switching to
> "infrahuman" does not shed light on the worry.

What I'm saying is that you don't get Singularity tragedies - as opposed to
ordinary military tragedies ("Friendly fire") - unless you're dealing with a
transhuman AI. And if you're dealing with a transhuman AI then it is
probably not relevant whether the AI is in immediate command of a tank unit;
the AI must be Friendly.

> If you assume it was morally acceptable to kill people earlier and train
> the AI that this is so then how will you latter train it that this isn't
> so, assuming it hasn't already gone beyond being influenced by your
> attempts at training?

What you do is teach the AI in such a way that final moral decisions are not
represented as correct in themselves, but correct conclusions from premises
which, under improved intelligence, can be used to reach the actual valid
conclusion. To give a simple example, the AI is told to defend the United
States under the premise that all sentient life is equally valid but that
the military effectiveness of the US is the safeguard of international
stability and contributing to it saves lives in the long run - because this
is what the AI researcher believes and it's why s/he works for the military.
  It doesn't quite matter whether the US is really the world's police as
long as the AI gets its mitts on the premise that all sentient life is
equally valid.

>> AI mistakenly self-modifies this system in a catastrophically wrong
>> way. I really don't see how that class of mistake pops out from an AI
>> learning wrong but coherent and not humanly unusual rules for when to
>> kill someone. If the AI starts questioning the moral theory and the
>> researcher starts offering a load of rationalizations which lead into
>> dark places, then yes, there would be a chance of structural damage and
>> the possibility of catastrophic failure of Friendliness.
> Ah. If the researcher says one thing at one time about violence and then
> tries to turn it around and remove the violence options then isn't that
> an inherent contradiction likely to lead to "a chance of structural
> damage..."?

No, it's called a programmer-assisted recovery from a programmer error, and
should be simple and routine. I expect the programmers will routinely
contradict themselves without being aware of it. This is not a case special
to Friendly AI.

> If it is wrong when the AI "grows up" then it was wrong to
> require it of the AI when it was young. I doubt the AI will miss the
> contradiction.

Of course not. The point is that the researcher was being honest earlier,
and later (a) changed his/her mind, or (b) was contradicted by the grownup
AI reconsidering the moral question at a higher level of intelligence.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT