Re: Problems with AI-boxing

From: Eliezer S. Yudkowsky (
Date: Fri Aug 26 2005 - 19:14:17 MDT

Phil Goetz wrote:
> --- Jeff Medina <> wrote:
>>Having the power to let the AI out directly or through a human
>>intermediary are functionally indistinguishable scenarios. If a UFAI
>>can convince any given human to let it out with nothing but words, it
>>can convince any given human to convince any other given human to let
>>it out using nothing but words.
> Are they? Sounds like Eliezer needs to conduct another
> round of experiments... he must persuade someone to persuade
> someone else to let the AI out. :)

I estimate that to be beyond my ability (and agree with Wilson that these
scenarios are functionally distinguishable). I suppose I'd try if someone
offered me a large enough bet, but it'd have to be large enough to make up for
the small probability of success.

I remind everyone that Eliezer is *not* a smarter-than-human AI. So far as
I'm concerned, I made my point with the first AI-Box Experiment. Humans
cannot reliably estimate what they cannot be convinced of. If there are
humans who really will never let the AI out, and who somehow also know that
this is true of themselves, still we on the outside cannot distinguish them
from people who only believe themselves unconvincable. The very first AI win
showed the whole strategy unreliable, too unreliable for existential risk.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT