Re: Military Friendly AI

From: Samantha Atkins (samantha@objectent.com)
Date: Thu Jun 27 2002 - 15:48:50 MDT


Eliezer S. Yudkowsky wrote:
> Ben Goertzel wrote:
> >> To summarize the summary, the main danger to Friendliness of military
> >> AI is that the commanders might want a docile tool and therefore
> >> cripple moral development. As far as I can tell, there's no inherent
> >> danger to Friendliness in an AI going into combat, like it or not.
> >
> > In my view, the main danger to Friendliness of military AI is that
> the AI
> > may get used to the idea that killing people for the right cause is not
> > such a bad thing...
>
> Yes, this is the obvious thing to worry about.
>
> > Your arguments for why Friendliness is consistent with military AI are
> > based on your theory of a Friendly goal system as a fully logical,
> > rational thing.
>
> As I understand "rationality", association, intuition,
> pattern-recognition, et cetera, are extensions of rationality just as
> much as verbal logic. If our culture thinks otherwise it's because
> humans have accumulated more irrationality-correctors in verbal
> declarative form than intuitive form and hence associate rationality
> with logic. From a mind-in-general's perspective these are different
> forms of rational intelligence, not rational and irrational
> intelligence. Anyway...
>

I don't think it is cultural. Or perhaps the problem is that
"rational" and "irrational" are too fuzzy to be of much use in
clarifying the points. In any case I don't see why an SI would
be any less inclined to have its choices of possible solution
paths be influenced by previous taken paths that had "worked" in
other similar contexts than we are.

> An AI can learn the programmer's mistakes in verbal form, associational
> form, recognized patterns, et cetera. The critical issue is whether,
> when the AI grows up, the AI will be able to correct those mistakes.
>

So, you are expecting it to decide that killing people was a
"mistake" and drop it from future problem solving?

> > This means that its logical reasoning is going to be *guided* by
> > associations that occur to it based on the sorts of things it's been
> > doing, and thinking about, in the past...
> >
> > Thus, an AI that's been involved heavily in military matters, is
> going to
> > be more likely to think of violent solutions to problems, because its
> > pool of associations will push it that way
>
> And its memories, its concepts, its problem-solving skills, and so on.
> But this is only a structural error if the AI attempts to kill its own
> programmers to solve a problem. I suppose that's a possibility, but it
> really sounds more like the kind of thing that (a) happens in science
> fiction but not real life (as opposed to things which happen in science
> fiction and real life), and (b) sounds like a failure of infrahuman AI,
> which is probably not a Singularity matter. Besides, I would expect a
> combat AI to be extensively trained in how to avoid killing friendly
> combatants.
>

It might or might not kill its own programmers. The danger is
whether it considers killing itself to be a long-term viable way
of dealing with problems. If it carries this beyond the point
where humans can influence its programming we have a problem as
far as I can see.

> > I don't want an AGI whose experience and orientation incline it to
> > associations involving killing large numbers of humans!
>
> Despite an immense amount of science fiction dealing with this topic, I
> honestly don't think that an *infrahuman* AI erroneously deciding to
> solve problems by killing people is all that much of a risk, both in
> terms of the stakes being relatively low, and in terms of it really not
> being all that likely to happen as a cognitive error. Because of its
> plot value, it happens much more often in science fiction than it would
> in reality. (You have been trained to associate to this error as a
> perceived possibility at a much higher rate than its probable real-world
> incidence.) I suppose if you had a really bad disagreement with a
> working combat AI you might be in substantially more trouble than if you
> had a disagreement with a seed AI in a basement lab, but that's at the
> infrahuman level - meaning, not Singularity-serious. A disagreement
> with a transhuman AI is pretty much equally serious whether the AI is in
> direct command of a tank unit or sealed in a lab on the Moon;
> intelligence is what counts.
>

Well gee, that is a great relief! But you haven't really
convincingly stated why this is not a possibility. If the
combat AI is capable of evolving (is a Seed) then we have a
problem, no? The switching to "infrahuman" does not shed light
on the worry.

> > You may say that *your* AGI is gonna be so totally rational that it will
> > always make the right decisions regardless of the pool of associations
> > that its experience provides to it.... But this does not reassure me
> > adequately. What if you're wrong, and your AI turns out, like the human
> > mind or Novamente, to allow associations to guide the course of its
> > reasoning sometimes?
>
> Then the AI, when it's young, will kill a bunch of people it didn't
> really have to. But that moral risk is inherent in joining the army or
> working on any military project. The Singularity risk is if the AI's

And is unacceptable.

> training trashes the part of the Friendship system that would be
> responsible for fixing the learned error when the AI grows up, or if the

If you assume it was morally acceptable to kill people earlier
and train the AI that this is so then how will you latter train
it that this isn't so, assuming it hasn't already gone beyond
being influenced by your attempts at training?

> AI mistakenly self-modifies this system in a catastrophically wrong
> way. I really don't see how that class of mistake pops out from an AI
> learning wrong but coherent and not humanly unusual rules for when to
> kill someone. If the AI starts questioning the moral theory and the
> researcher starts offering a load of rationalizations which lead into
> dark places, then yes, there would be a chance of structural damage and
> the possibility of catastrophic failure of Friendliness.
>

Ah. If the researcher says one thing at one time about violence
and then tries to turn it around and remove the violence options
then isn't that an inherent contradiction likely to lead to "a
chance of structural damage..."? If it is wrong when the AI
"grows up" then it was wrong to require it of the AI when it was
young. I doubt the AI will miss the contradiction.

- samantha



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT