From: Randall Randall (randall@randallsquared.com)
Date: Tue Aug 02 2005 - 18:03:43 MDT
On Aug 2, 2005, at 6:34 PM, Daniel Radetsky wrote:
> On Tue, 2 Aug 2005 08:07:04 -0400
> Randall Randall <randall@randallsquared.com> wrote:
>> In summary, even if there were an apparently complete
>> model of physics, and we could work out the consequences
>> of any given action to an indefinite precision and time,
>> we would still not have a guarantee of no exploits.
>
> But we don't have to guarantee no exploits for it to be unjustified to
> suppose
> there are any.
In the context of containment of superhuman intelligence(s),
it is justified to suppose that there may be exploits unless
we have actual evidence strongly suggesting that there are
none. When I program, having finished a unit of code, I have
no evidence that there are any bugs in what I've written.
Still, prudence indicates that I should *assume* that there
are bugs in my code, and write tests to catch them if they
exist. If no bugs exist, I have not harmed myself greatly
by assuming they do. If they do, and I assume they do not,
I will fail.
All I am arguing is that prudence indicates that we should
assume that exploits exist until we have some reason to
believe they do not, such evidence consisting of a theory
of physics that appears complete and tractable. I do not
expect to see such a theory before superhuman intelligence
(if possible), and so I think we can "safely" assume that
exploits exist, an assumption which, if false, will at least
not harm us.
Given this, safeguards must be built into the superhuman
intelligence directly. This, of course, is the Party
position, the common wisdom, on this list. This is why
(I believe) people on this list keep insisting that some
exploit may exist: assuming that they do not predetermines
failure in the case that they do, while the reverse is not
true.
Now, it may or may not be *probable*, with what we know
today, that exploits do exist, but just as a pack of
dogs cannot reason about a helicopter they haven't had
any experience of, I don't think we have any way to
reason about the probability of exploits within those
parts of physics we don't yet have nailed down.
>> given that we know of areas that are incomplete, and
>> given that we know that it's possible to surprise us even
>> within ostensibly complete areas, it should be obvious
>> that exploits must be assumed to exist for intelligences
>> which dwarf our own (if that's possible).
>
> If you know of an "exploit" like the ones under discussion, I'd love
> to hear
> about it. I don't think you have any good reason to suppose reasonable
> exploits
> exist.
Just to be clear, I do not know of any. My best guess
about Eliezer's box-break involves reasoning with the
jailer about what future jailers will do, combined with
a carrot and stick. That assumes that turning off the
AI permanently isn't an option, which is a detail I don't
actually remember. I don't suppose I'll know for a long
time, if ever, whether my guess is close to correct.
-- Randall Randall <randall@randallsquared.com> Property law should use #'EQ , not #'EQUAL .
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT