From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Thu Jul 21 2005 - 11:23:42 MDT
Michael Vassar correctly wrote:
>
> c) "magic" has to be accounted for. How many things can you do that a dog
> would simply NEVER think of?
Daniel Radetsky wrote:
> "Ben Goertzel" <ben@goertzel.org> wrote:
>
>> How about the argument that every supposedly final and correct theory of
>> physics we humans have come up with, has turned out to be drastically
>> wrong....
>
> This provides an infinitesimal degree of support to the claim that the real
> final and correct theory would permit magic...
Only if 'magic' is interpreted as drawing mystic circles and waving your
hands. 'Magic' takes on a different meaning here - it means, simply, anything
you didn't think of. Not just anything a human would simply NEVER think of,
but also anything YOU didn't think of - the former being a subset of the
latter and therefore also 'magic'. The point of the AI-Box Experiments is
that I can do 'magic' in the latter sense relative to some people who firmly
stated that NOTHING could possibly persuade them to let an AI out of the box.
Obviously, being human, I did nothing that was *strongly* magical.
The problem of magic is the problem of a very large search space, in a case
where we not only lack the brainpower to search each element of the space, we
may lack the brainpower to properly delineate the search space. The AI is not
limited to breaking out via any particular method you consider. Neither you
nor the AI have enough computing power to consider the *entire* search space,
but the AI may search much more efficiently than you do (not to mention a lot
faster). Thus, your inabiliity to think of a way out yourself, is only slight
evidence that the AI will be unable to think of a way out. Similarly, the
conviction of certain people that no possible mind, even a transhuman, would
be unable to persuade them to let the AI out of a box; was not strong evidence
that I would be unable to persuade them to let the AI out of the box.
Creativity has the appearance of magic when, even after the fact, you don't
know how it worked. The AI-Box Experiment is a lesson in the tendency of
sufficiently large search spaces to contain magic. That is why I refuse to
publish transcripts. Get used to the existence of magic.
The argument which Ben Goertzel cites is that, since physics has changed over
the last few generations, we should anticipate that we have stated the search
space incorrectly when we consider all physical means by which the AI might
break out of the box. This does not mean that the AI *has* to go outside
known physics to break out, because there might also be an escape route in
known physics that you did not think of. Consider OpenBSD, the most secure OS
you can obtain on an open market. OpenBSD is constantly changing as people
discover new bugs and fix them. Our fundamental physics is in less flux than
OpenBSD, though arguably over history fundamental physics has gone through
more total change than OpenBSD. I don't know how to break into an OpenBSD
box, since I am not a security expert, as you are not a physicist. I
anticipate that even some people whose job title says "system administrator"
wouldn't be able to break into a patched OpenBSD box. I have no idea whether
NSA spooks can break into an OpenBSD box but my money would be on 'yes'. The
fact that OpenBSD is often discovering new bugs is a strong argument that
additional security flaws exist. Of course there are *presently* no security
flaws known to the open-source community or they would be fixed, just as there
are presently no known predictive breakdowns in modern physics. But we expect
there are more bugs in OpenBSD and we expect there are more bugs in our model
of physics.
Even though I can think of absolutely no way to break into an OpenBSD box
using only an Internet connection, I anticipate that an AI could break into an
OpenBSD box using only an Internet connection, using 'magic'. Maybe this
would involve going outside the search space that I or NSA spooks think about;
though probably there are answers within the search space too. Who can guess
which would be more efficient? Going outside the search space involves some
extra trouble in any particular case, but building the infrastructure for
magic might save the AI effort in the long run. From our perspective the
results are the same whether the loophole is creative (comprehensible in
retrospect) or magic (inexplicable even after we witness the results).
> For example, we might say that the AI might be able to manipulate its
> hardware in such a way as to create electromagnetic fields (or some such
> thing) and "break the box." Given the AI's degree of flexibility in the
> hardware, there is some chance that it will succeed. If the chance is
> vanishingly small, we say that this exploit in the box's solidity is not
> reasonably exploitable.
>
> So, I want to know why you believe there is a exploit which is reasonably
> exploitable.
If you consider any *single* element of a large search space, the probability
remains infinitesimal that *that single element* is an escape route. It is
the probability of the whole search space that is the problem. If I consider
a single bitstring targeted at a single TCP/IP port, its probability of
breaking the OpenBSD box is very low. If an AI sends out that exact bitstring
then the probability is still very low, presuming there are no free variables
to manipulate such as the time of attack. Similarly the probability that
classical magic, drawing mystical circles, will work, remains low even if it
is an AI drawing the mystic circles. But if the AI can send arbitrarily
formed bitstrings to any port, then the probability of a working exploit
existing is high, and the probability of a seed AI being able to find at least
one such exploit, I also estimate to be high.
When you cite particular physical means of breaking a box and their apparent
implausibility to you, you are simply saying that some particular bitstring
probably does not obtain root on an OpenBSD box. What of it? How many things
can you do that a dog would simply NEVER think of?
Daniel Radetsky wrote:
>
> I can't, but I submit that no one on this list has any basis to assess the
> probability either. So if I claim that the probability is infinitesimal,
> then your only basis for disagreement is pure paranoia, which I feel
> comfortable dismissing.
That's not how rationality works. If you don't know the answer you are not
free to pick a particular answer and demand that someone disprove it. It is
analogous to finding a blank spot on your map of the world and rejoicing, not
because you have new knowledge to discover, but because you can draw whatever
you want to be there. And once you have drawn your dragon or your comfortable
absence of dragons, you become committed to your ignorance, and all knowledge
is your enemy, for that it might wipe away that comfortable blank spot on the
map, over which you drew... you have not made this error too greatly, but
there have been others on SL4 committed to defending their comfortable
ignorance. There is no freedom in the way of cutting through to the correct
answer. It is a dance, not a walk. On each step of that dance your foot must
come down in exactly the right spot, neither to the left nor to the right. If
you say that the probability of this very large search space containing no
exploit is 'infinitesimal', you must give reason for it. If I say that the
probability is 'certain', I must give reason for it. You cannot hole up with
your preferred answer and wait for someone to provide positive disproof; that
may comfort you but it is not how truthseeking works.
When there is a blank spot on the map our best guess is that it is "similar"
to past experience. The art consists of a detailed understanding of what it
means to be "similar". Similarity of fundamental laws takes precedence over
similarity of surface characteristics. Many would-be flyers failed before the
Wright Flyer flew, but if you could make a physical prediction of exactly when
the other flyers would hit the ground, you could use the same quantitative
model of aerodynamics to predict the Wright Flyer would fly. So we should
assume the blank spot on the map follows the same rules as the territory we
know, interpreted at the most fundamental level possible. Does this mean we
assume that the blank spot on the map obeys known physics exactly? Yes and
no. If we have any particular question of physics in which an exact,
quantitative prediction is desired, then we have to assume that the prediction
of present physics is the point of maximum probability. If you have a
computer containing a superintelligent AI, and you throw it off a roof 78.4
meters high, and you want to know the computer's downward velocity when it
hits the ground, the best *quantitative* guess is 39.2 meters/second. If the
computer does not hit the ground due to 'magic', i.e., some action performed
by an intelligence that searches a space we cannot search as well ourselves
nor correctly formulate, we have no idea where it will go or how fast it will
be moving. Hence the prediction of modern physics is by far the best *exact*
guess. That is one sense in which we presume the blank spot on the map
resembles known territory. But we are not committed to the absurd statement
that we expect every one of our physical generalizations to prove correct in
every possible experiment in the future, even though at any particular point
any particular generalization is our best exact guess. This is no more
paradoxical than my simultaneous expectation that any specific ticket will not
win the lottery and that some ticket will win the lottery. My beliefs are
probabilistic, so that any large number of individual statements can have a
high probability, yet their conjunction a low probability.
It is an interesting question how *exactly* to formulate the generalization,
'All past models of physics except one have already proven incorrect, so I
estimate a low probability that we are at the final step currently'. Or how
to note the facts that present physics has persisted over a subjective time
consistent with past generalizations that were eventually disproven (i.e. not
an unusually long time) or that there are known problems in the modern theory
(i.e. reconciling quantum mechanics and general relativity). This literally
"meta-physical" generalization yields no specific predictions so it can't
override physics in any specific case.
But in the case where we have a superintelligent AI then we may need to think
about a cognitive system that systematically searches a large space for *any*
useful breakdown in our physical model (or anything we didn't think of that
can be done within modern physics). It's sort of like a man falling out of a
plane. How does a dog that knows physics, predict the unfolding of a
parachute? The answer is that the dog cannot calculate when the parachute
will hit the ground, but the dog would be wise to allocate less probability
mass than usual to the proposition that the man hits the ground at the time
predicted by Galileo. The dog may be justly confident that if the man waves
his hands and chants "Booga booga" or if the man straps anvils to his feet
that the man will still hit the ground. But for the man to actually, reliably
hit the ground, requires the truth of the enormous conjunction, "No matter
*what* the man does he will still hit the ground."
To guess that physics might break down *somewhere*, or that known physics
might contain some way to break out of the box, presumes that the blank spot
is similar to known territory; but the presumption takes place at a higher
level. It is a generalization about intelligence, goals, creativity, and what
happens when a higher intelligence encounters a space blank to us. This last
generalized mostly from human past civilizations contrasted to human future
civilizations, because if we compared chimpanzees or lizards to humans we
would have to conclude that the answer was just pure incomprehensible magic.
But since the degree by which the human future outsmarted the human past is
impressive enough to rule out AI-boxing as a good idea, there is no need to
appeal to strong magic.
If you consider in your security model a branch describing the existence of a
hostile mind that is smarter than you are, you must assume that this branch of
your security model is a total loss. How many things can you do that a dog
would simply NEVER think of?
It may still make sense to try and make precautions against hostile
transhumans because this is a likely failure mode, even of the branches of
your security model that don't explicitly expect it. A hope in hell is better
than no hope, and the precautions may make sense in any case - be useful
against nascent pre-transhumans particularly. But if a branch of your
security model involves an unknown probability of creating a hostile
transhuman, you have to assume that this is an unknown probability of total
loss, not rely on your dog to invent countermeasures.
The problem with the AI-Box paradigm is that it assumes that the existence of
a hostile transhuman is a manageable problem and makes this the foundation of
the strategy. Typically you assume in your security model that if the
terrorists smuggle a nuke into New York City, set it up in the UN Building,
escape to safe distance, take the trigger out of their pockets, and put their
finger on the trigger, well, you've sorta lost at that point. Stop them if
you can, any way you can, but your security model is supposed to rely on
stopping the nuclear weapon EARLIER. Maybe in Hollywood the hero crashes in
through the door at this point, but that's not what security experts assume.
Countries don't allow known terrorists through customs carrying nuclear
weapons on the theory that, hey, the hero can always shoot them if they look
like they're going to pull the trigger. Allowing the existence of a hostile
transhuman is just plain STUPID, end of story.
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT