Friendliness not an Add-on

From: Marcello Mathias Herreshoff (
Date: Sat Feb 18 2006 - 19:38:59 MST

It has been suggested (by Ben Goertzel and others) that it might be practical
to add a Friendliness module in a controlling role to a non-verifiable AI.
Unfortunately, this is a far pricklier problem than it appears to be.

For example, suppose that the AI is creating an improved version of itself.
It needs to write a Friendly piece of code.

The "Friendliness module" dispatches its wishes to the "AI proper", which uses
evolutionary programming, and/or other similarly messy techniques to write the
next version of the code. It would be unsafe for the "AI proper" to directly
execute the code it just wrote; it might have made a mistake. However, we
assumed a controlling role, so the Friendliness module instead gets to decide
whether to accept the new piece of code or not.

But, Rice's theorem <'s_theorem > states
that there is no way to create a checker for every non-trivial property
(Friendliness is certainly non-trivial in this sense.)

To put it less formally, we'd be giving our Friendliness module the use
of a genie which is somewhat unreliable and whose reliability in any
particular decision is, for all intents and purposes, difficult to check.

Knowledge of the AI proper's decision is not sufficient here. The Friendliness
module would also need the reasoning behind the decision in order to verify
it. However, if the AI proper has a non-verifiable architecture, this
knowledge may only exist in a crippled form, or, in extreme cases like
evolutionary programming, not at all. The only rescue from this mess is to
make the "Friendliness module" smarter than the so-called AI itself.

At this point, it probably wouldn't take too much more work to ditch the "AI
proper" and just run the Friendliness module.

-=+Marcello Mathias Herreshoff

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:55 MDT