>In recent post on Overcoming Bias, Eliezer told that Nick Bostrom
>suggested the same setup years ago (see
> ). In that
>description, sane-enough-AI-in-the-box is called Oracle AI, which is
>used to help with building a theory of actual Friendly AI. Here is a
>relevant passage:
>> Nick Bostrom, however, once asked whether it would make sense to
>> build an Oracle AI, one that only answered questions, and ask it our
>> questions about Friendly AI. I explained some of the theoretical
>> reasons why this would be just as difficult as building a Friendly
>> AI: The Oracle AI still needs an internal goal system to allocate
>> computing resources efficiently, and it has to have a goal of
>> answering questions and updating your mind, so it's not harmless
>> unless it knows what side effects shouldn't happen. It also needs
>> to implement or interpret a full meta-ethics before it can answer
>> our questions about Friendly AI. So the Oracle AI is not
>> necessarily any simpler, theoretically, than a Friendly AI.

 Eliezer, are you sure the concept that you are criticizing is the
same concept that Nick advocates?
 I can imagine an Oracle AI which is as hard to implement as you suggest.
But if an Oracle AI is designed with a more limited goal of answering a
subset the questions you have in mind, such as giving "yes,no, or unkown"
answers to questions submitted in a computer programming language, then
it's hard to see how it would need a full meta-ethics. I expect there is
a range of possible Oracle AIs between those extremes, with different
benefits and risks. I doubt that anyone has been able to analyze enough
of those possibilities well enough to be confident about whether the
easier-to-implement possibilities would be powerful enough to help
avoid mistakes in designing a Friendly AI.

