Obedient AI?

From: Peter C. McCluskey (pcm@rahul.net)
Date: Thu Nov 25 2004 - 10:15:37 MST

 I'd like to suggest an alternative to Friendly AI / CV that involves an
AI programmed to answer questions, combined with a human institution that
is designed to control what questions the AI may be asked.
 The AI would be designed with a supergoal of answering any questions put
to it as accurately as it can within a set of resource constraints that
is given to it for each question. The AI would not try to discern the
intent behind a question beyond the extent to the intent gets encoded
in the question.
 The human institution would serve to limit the questions and resource
limits to those that are safe. For example, it would ensure that the AI
doesn't attempt to turn the planet into computronium by verifying that
whoever asks the question pays for all the resources used in calculating
the answer.

 What could go wrong?

 Concepts such as "resource constraints" and "accurately" could be programmed
into the AI in a buggy way. But we appear to understand these concepts better
than we understand concepts such as friendliness, so I claim the risks are
lower. Also, the consequences of a bug might be less than the consequences
of a bug in code for friendliness, volition, etc.
 The human institution could misuse its power for undesirable goals such as
world conquest. But that is a normal kind of problem we have faced with a
number of prior technological changes, and we have a lot of experience at
surviving this.
 The AI could be inadequate to deal with an unfriendly AI. If we see signs
that this will be a problem (we will be able to get some predictions from
the obedient AI that will help determine this), we should be able to use
the obedient AI to help us design another type of AI, such as a friendly AI,
with the power necessary to deal with unfriendly AI. I suspect this two-step
process of producing the friendly AI would be a good deal less error-prone
than trying to directly design the final AI, because at the hardest stage we
would have an AI to help us. This conclusion does depend somewhat on
assumptions about how fast a friendly AI would take off. I envision the
obedient AI being improved by its developers repeatedly asking it what
changes should be made to it in order to increase the accuracy of the AI's
answers or the range of questions it can answer, and then applying those
changes. If the first AIs take months or years to go from an IQ of 50 to
an IQ of 500, then the slowdown caused by having humans in the loop need
not be important. There are apparently some people on this list who expect
this takeoff to happen in less than a day. I've been ignoring those strange
ideas because they seemed to have little relevance to how we should act,
but if I read some serious arguments that there's a significant chance of
a fast takeoff and that that is the deciding issue on which to choose
between obedient AI and CV, I will argue against the fast takeoff.
 It could be hard to program a supergoal so that it contains two unrelated
attributes such as accuracy and resource constraints. But Pei Wang's NARS
seems to demonstrate a system which appears to come close enough to doing
this that I doubt it is a serious problem.
 Have I missed anything?

Peter McCluskey          | Please check out my new blog:
www.bayesianinvestor.com | http://www.bayesianinvestor.com/blog/

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:50 MDT