From: Randall Randall (randall@randallsquared.com)
Date: Tue Aug 02 2005 - 19:34:16 MDT
On Aug 2, 2005, at 8:54 PM, Daniel Radetsky wrote:
> On Tue, 2 Aug 2005 20:03:43 -0400
> Randall Randall <randall@randallsquared.com> wrote:
>
>> Just to be clear, I do not know of any [exploits]. My best guess about
>> Eliezer's box-break involves reasoning with the jailer about what
>> future
>> jailers will do, combined with a carrot and stick.
>
> For those joining the debate late, the issue of whether or not the AI
> could
> talk its way out of a box is a seperate issue from the existence of
> exploits.
This is not entirely accurate. Exploits are shorthand for
"things we didn't think of or protect against, which could
be used to escape from containment". Psychological tricks
are one kind of possible exploit, but are a very mundane
example of them, and hence likely to lull us with the idea
that we can protect against exploits in general.
I *believe* that Eliezer's point is merely that anything
you don't think of or don't protect against can ruin your
day, not that there is a necessarily a specific physical
loophole.
In any case, the general thing to take away, in my opinion,
is that since anything you don't think of might in principle
result in failure to contain, plugging any number of specific
leaks or classes thereof is not going to reduce the potential
problem space -- only a humanly comprehensible theory of
everything could do that even in principle, and we don't seem
likely to get one.
-- Randall Randall <randall@randallsquared.com> "Are you hungry? I haven't eaten since later this afternoon." -- Aaron, _Primer_
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT