>2) Friendly AI can be created which cannot (to a very high degree
> of certainty) deviate from Friendliness.

> My position is as follows:
> However, I don't believe #2 is very likely at all. The ultimate goal as I
> understand it is to create a super intelligent being (AI/SI) which can
> reprogram itself and has free will. How the hell can anyone believe that
> we could actually manage to permanently install ANY trait into such a
> being? If it decided, for any reason, to change any aspect of itself there
> is no way we can prevent it. We aren't intelligent enough to understand
> what an SI truly is much less directly create one or fully understand
> one. We don't even really understand ourselves for that matter. Given
> such facts, how is it possible for anyone to believe that we are smart
> enough to directly control any aspect of an evolved SI for which we only
> understand the seed?

Good. I had the same objection. See the "When Subgoals Attack Thread":

Basically, if you design the SI to prioritize being Friendly and also
to prioritize Remaining Friendly (and do it right), your AI will apply
all of ver mental effort toward never drifting away. So even if you or
I or Eliezer has any idea how to keep the system in check, the AI
*verself* should do the task. That's a powerful thought. Let it sink in.

> Further, there is no way for us to verify that any created AI/SI is
> actually friendly.

You're probably right. But again we can (or have to, depending on how
you see it) rely on the AI doing the reliance measurements. If we get
it wrong, we're probably toast. If we get it right, we totally win.

> If we are lucky, super intelligence itself will give rise to
> friendliness. But that is the best we can hope for.

I disagree. Following Eli's line of reasoning we can and should try to
do much better.

Durant Schoon

