Re: Problems with AI-boxing

From: Jeff Medina (
Date: Fri Aug 26 2005 - 19:57:41 MDT

Michael Wilson wrote, "I'm not sure that this follows. A human trying
to convince another human only has the benefit of whatever
arguments/knowledge/tricks the AI can give them in advance."

Certainly. And you and Eliezer are confident a superintelligent AI
couldn't arm a human with enough arguments, knowledge, and tricks
(nevermind IA, which is of course also a possibility) to convince
another human to let it out of the box?

The indistinguishability refers to the point reiterated by Eliezer:
"Humans cannot reliably estimate what they cannot be convinced of.
[...T]he whole strategy [is] unreliable, too unreliable for
existential risk."

I assert that humans cannot reliably estimate whether another human
can avoid being convinced of something he is skeptical of, given the
convincer is another human with unspecified privileged knowledge and
persuasive skill.

I also point out that the human-to-human discussion should be severely
guarded (let the human gatekeeper be in an undisclosed distant
location, for example), to minimize the chances of the human who spoke
with the AI from torturing the gatekeeper, administering truth serum
to get the gatekeeper to tell her/him how to unbox the AI, etc., on
the possibility that the AI convinced the human its release was
important enough to warrant such extreme measures.

If the assertion in the paragraph before last holds, the 'firewall'
suggestion is indistinguishably unsafe. I think this is clear enough,
so perhaps what I meant by 'functionally indistinguishable' was not.


Jeff Medina
Community Director
Singularity Institute for Artificial Intelligence
Relationships & Community Fellow
Institute for Ethics & Emerging Technologies
School of Philosophy, Birkbeck, University of London

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT