Re: AI Boxing:

From: Vladimir Nesov (
Date: Tue Jun 03 2008 - 08:53:27 MDT

On Tue, Jun 3, 2008 at 6:00 PM, Randall Randall
<> wrote:
> The assertion that there is no such combination of words is equivalent
> to the assertion that the human brain is perfectly secure. Given that
> more complex systems have more vulnerabilities (all else equal) and
> that brains were evolved rather than designed, it seems to me to be
> wildly implausible that there are no possible exploits for the brain.
> It would not surprise me to learn that there were exploits which
> required only seconds to perform verbally. It would, however,
> surprise me to learn that Eliezer had discovered one; the space of
> possibilities is large, and there's no reason to think that a human
> could reason their way to such a thing.

Without quantitative assessment, there is no reason to think that
superintelligent AI in the box will be able to execute such exploit
and hope to win. Even if there is an exploit, there are two obvious
prerequisites to applying it: (1) detailed knowledge about the state
of gatekeeper's brain and environment, and (2) availability of action
that will achieve required effect with high probability. Extraordinary
beliefs require extraordinary evidence. If AI has a plan of breaking
out, it is reasonably sure that the plan will work. And to be sure, it
must obtain enough information about why it will work, which may be
unavailable. This information can't magically appear in its mind
without there being adequate sensors and actuators. In out case, we
have extensive information about humans-in-general (presumably from AI
studying Internet dump), and little text-only information about
gatekeeper and likewise limited method of acting on him. The problem
with AI is that it will presumably be much more efficient at
extracting knowledge from evidence, but still it can't overcome
fundamental information-theoretic limits which in this case may as
well limit its ability to influence gatekeeper.

Vladimir Nesov

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:03 MDT