Re: Problems with AI-boxing

From: Daniel Radetsky (
Date: Sat Aug 27 2005 - 18:13:30 MDT

On Fri, 26 Aug 2005 10:37:37 +0100
Chris Paget <> wrote:

> Whether it is friendly or unfriendly is irrelevant - you cannot
> guarantee that the same will apply once it is released. Once more
> computational power is available to it, it will be capable of more
> complex reasoning and it's morals will change accordingly - an
> unfriendly child may grow up into a friendly adult, and any moral rules
> may break down or exceptions be discovered when analysed in more detail.
> Thus, until the AI is released from the box it will be largely
> impossible to guarantee whether it is friendly or not.

I find this statement interesting. Do you think that an AI's being friendly at
some time t is literally irrelevant to whether it will be friendly at t+x, when
it has more computing power? That is, do you assert

P(AI is friendly|AI was friendly) = P(AI is friendly)?

It seems like if we accept this, then we should not make an AI even if we
are sure that our design will lead to friendliness, because prior friendliness
has no correlation with future friendliness. I don't see how you can restrict
the implications of your position to in-the-box vs. out-of-the-box. Why does it
not apply to any AI as its amount of computational power (or sophistication of
reasoning) grows? I submit that it does apply.

Now, maybe you meant that prior friendliness is not a guarantee of future
friendliness, or

P(AI is friendly|AI was friendly) != 1.

But so what? Nothing could be a guarantee, not even a perfect theory of how to
build a FAI. So maybe you just mean that prior friendliness is not strong or
good evidence for future friendliness, whatever that means. So give me a hint:
why do you believe that prior friendliness is not good evidence of future


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT