From: Eliezer S. Yudkowsky (firstname.lastname@example.org)
Date: Sat Jan 10 2004 - 03:38:13 MST
Ben Goertzel wrote:
>> Ben Goertzel wrote:
>>> It might be significantly easier to engineer an AI with a 20% or 1%
>>> (say) chance of being Friendly, than to engineer one with a 99.99%
>>> chance of being Friendly. If this is the case, then the
>>> broad-physical-dispersal approach that I suggested makes sense.
>> 1) I doubt that it is "significantly easier". To get a 1% chance
>> you must solve 99% of the problem, as 'twere. It is no different
>> from trying to build a machine with a 1% chance of being an internal
>> combustion engine, a program with a 1% chance of being a spreadsheet,
>> or a document with a 1% chance of being well-formed XML.
> Actually, the analogies you're making are quite poor, because internal
> combustion engines and spreadsheets and XML documents are not complex
> self-organizing systems.
An internal combustion engine is a complex dynamic system. The gasoline
flows, each molecule bumping into other molecules, its fluidity determined
by details of the electromagnetic interaction between molecules relative
to the prevailing temperature, yet the fluid dynamics as a whole can be
excellently simplified to equations that describe quite different
quantities; when electricity flows through the spark plugs it obeys
Maxwell's Equations, and when the gas mixed with oxygen explodes the
explosion ripples out in obedience to differential equations.
Having seen, from other fields beyond AI, what it means to "understand" a
system, and having recently finally understood a thing or two about AI for
the first time, I now realize that "complex dynamic system" means "I do
not understand which particular dynamics are involved".
"Emergence" is sort of like the way that fluid dynamics can be usefully
simplified to high-level equations that are unlike the deep kinetic and
electromagnetic equations, except that in Artificial Intelligence the word
"emergent" means "without understanding either level".
"Self-organizing"? Well, now there's a wonderful magic wand of a word, to
be used whenever something mysterious happens, and indeed
"self-organizing" seems to make a fairly good synonym for "mysterious".
If I were ever to use the word, while still aspiring to my usual standards
for well-definedness, I would pick some more rigorous criterion of usage,
like, say, "self-organizing" referring to a multiplicity of locally
centered optimization pressures interacting to create an optimization
pressure on some higher-level property of the system, c.f. "emergence"
above. Or perhaps "self-organizing" could also be extended to cases where
an external optimization pressure, like natural selection, builds a system
where distributed local properties and their interactions create the
systemic behaviors that were subject to the external optimization
pressure. But here I am holding myself to much higher standards than most
people do when they use so marvelous and poetic a word as
"self-organizing". Mostly, IMO, it is used much like "phlogiston" in an
earlier era. Where does the organization come from? It's self-organizing!
> With Friendly AI, we're talking about
> creating an initial condition, letting some dynamics run (interactively
> with the environment, which includes us), and then continually nudging
> the dynamics to keep them running in a Friendly direction. This is a
> very different --- and plausibly, much less deterministic -- process
> than building a relatively static machine like a car engine or a
It sounds to me like a description that applies quite well to a car
engine. A car engine has initial conditions, check. Dynamics, check.
Nudging, check. The difference is not in determinism, the difference is
whether the designer is mystified by what is allegedly going on. A car
engine is nothing to sneer at; it is a work of art built by people who
understood the dynamics.
>> 2) Ignoring (1), and supposing someone built an AI with a 1% real
>> chance of being Friendly, I exceedingly doubt its maker would have
>> the skill to calculate that as a quantitative probability.
> Of course, that's true. But it's also true that, if someone built an
> AI with a 99.999% chance of being Friendly, it's maker is not likely to
> have the skill to calculate that as a quantitative probability.
Yes, this is covered in the part where I said:
>> To correctly calculate that a poorly assembled program (one
>> representing the limit of its maker's skill) has a 1% chance of being
>> Friendly - even to within an order of magnitude! - requires a skill
>> level considerably, no, enormously higher than that required to build a
>> program with a 99.99% chance of being Friendly.
> Making quantitative predictions about this kind of system is next to
> impossible, because the dynamic evolution of the system is going to
> depend on its environment -- on human interactions with it, and so
> forth. So to make a rigorous probability estimate you'd have to set
> various quantitative bounds on various environmental conditions, human
> behaviors, etc. Very tricky ... not just a math problem, for sure...
> (and the math problems involved are formidable enough even without
> these environmental-modeling considerations!)
Yes, I said as much in my post.
>> 3) So we are not talking about a quantitative calculation that a
>> program will be Friendly, but rather an application of the Principle
>> of Indifference to surface outcomes. The maker just doesn't really
>> know whether the program will be Friendly or not, and so pulls a
>> probability out of his ass.
> There are a lot of intermediate cases between a fully rigorous
> quantitative calculation, and a purely nonrigorous "ass number."
Not as many as one might think. See below.
> After all, should you ever come up with a design that you think will
> ensure Friendliness, you're not likely to have a fully rigorous
> mathematical proof that it will do so ... there will be a certain
> amount of informal reasoning required to follow your arguments.
One of the subtle damages done by our humanly inherent political frame of
mind is that, as do reporters or media, we tend to categorize things as
"provable" or "not provable". Since nothing is "provable" enough to
satisfy Leon Kass and the Precautionary Principle - since the political
battle is unwinnable, or at least unwinnable through any amount of proof -
well, why bother trying to prove things at all? But if one is to clear
away all political mud, it becomes apparent that there are many possible
standards of rational evidence to apply. In Friendly AI work I intend to
apply a certain rule which says that I need a particular kind of strict
specific expectation of a positive result before I, as a programmer, take
any action - with any piece of code, initial conditions and rules for
"complex emergent dynamics", and so on, being considered as a special case
of programmer action. The strictness to which I refer, the required
grounds for holding the expectation of success, is something that runs
skew to the impossible standard of the rhetorical trick known as the
Precautionary Principle, so Leon Kass would not be satisfied. But so
what? Just because Leon Kass cannot be satisfied by any standard of
proof, does not mean that I will commit suicide by relaxing my standard of
There's a difference between an engineering expectation that a
well-defined system accomplishes a well-defined result, which has not yet
been qualified as a mathematical proof; and holding an excited expectation
of success without even a clear criterion of success. These are quite
different levels of "informality".
>> 4) Extremely extensive research shows that "probabilities" which
>> people pull out of their asses (as opposed to being able to calculate
>> them quantitatively) are not calibrated, that is, they bear
>> essentially no relation to reality.
> Sure, but your statement doesn't hold when applied to teams of
> scientists making careful estimates of probabilities events based on a
> combination of science, math and intuition. In this case, estimates
> are still imperfect -- and errors happen -- but things are not as bad
> as you're alleging.
Unless you are familiar with the literature here, my reply is: "Sorry,
Ben, yours is a widespread opinion and in no sense an obviously stupid
one, but it has been shown to be wrong."
The most famous work in this area is with interviewers predicting student
success and doctors making probabilistic clinical judgments, but these are
just the most famous examples; the work has been applied to many domains.
(I, alas, am only acquainted with the most famous studies, since this is
not my primary field, but I know it's a robust result.) Note that in the
case of doctors, physicians not only have access to medical statistics and
well-understood underlying models, but also extensive clinical experience
with frequent feedback on cases much like the one being immediately
considered. So are doctors making clinical judgments of probability
well-calibrated? No, not at all. See Robyn Dawes on "The robust beauty
of improper linear models", in "Judgment under uncertainty", as well as a
good many other papers nearby, but Dawes goes into the greatest depth on
how tenaciously people hold on to their illusions of predictability
(though this too is a robust result).
In particular, Dawes explains how, even after supposed experts are shown
that (a) the variables of interest are far less predictable than they
thought and that (b) experts can be defeated by an improper linear model
using *randomly chosen weights* with the correct sign, people still hold
out endless hopes of the superiority of "clinical judgment". You can show
people the evidence and they still won't believe. They want to believe
the variables are predictable. They want to believe the experts can
predict them. They presume endlessly that something must have been wrong
with this study and that study, even as the studies keep piling up.
Dawes, in one of his books, narrates how after describing one study of
clinical judgment, someone in the audience said, "Well, you should have
tested the judgment of [Famous Dr. X]", where, although Dawes couldn't say
so at the time, Dr. X was in fact one of the experts tested.
So, Ben, I'll believe that scientists making careful guesses of
probabilities based on their equivalent of clinical judgment - in any case
where "intuition" is part of the mix along with science and math - are in
fact well-calibrated, when I see a study proving it. Because it goes
against everything I have heard from any study of human judgment.
Humans fail at the task of probabilistic clinical judgment, performing
more poorly than simple linear models, even in cases where the
fundamentals are known and the expert has years of experience dealing with
the same problem as the one presently at hand. Not experience with
"simpler cases" or "test problems", experience with immediate feedback on
many samples in the same context as the present problem. Calibration gets
worse, and overconfidence gets worse, as the difficulty of the problem
increases. So now the field of AI, with no grasp on the fundamentals, and
no sample of cases from the same context, will offer a "clinical judgment"
of the probability that an AI will be Friendly? In a word: No.
This doesn't mean that building a Friendly AI is impossible. It means
that if it can be done, it will be done when someone looks over the design
and says: "I believe that I understand every single thing that this
design is supposed to accomplish; I believe that there is not one place
where I am closing my eyes and hoping; I believe that I am no longer
confused. This design has massive overkill directed toward solving the
problem, on a level I couldn't even have imagined when I was a neophyte.
I know that availability is not the same as probability, but if I were
naive enough to evaluate the availability, then I have difficulty in
imagining how this design could fail at all, let alone how it could fail
catastrophically. And I know that emotional certainty is not the same as
calibrated confidence, but if I were naive enough to see the problem
emotionally, then I would say that what I see would inspire most people to
have the emotional reaction they name '99.9% confidence'."
If someone looks at an AI, and they think they understand all the
fundamentals, and the AI is such as to inspire the emotional reaction that
people usually name "70% confidence", then the AI has maybe 1 chance in
100 of working. And if someone looks at the AI and they don't really know
whether it will work, but they're sort of hoping, then they're committing
Now you can believe that you understand a system and yet be wrong, as Leon
Kass is bound to point out. But so what? How strong the statement "I
believe that I finally understand" is as evidence depends on how lightly
the person utters it, how given they are to wishful thinking, whether the
person understands what it means to understand something. It can be
wrong, yes. But to go ahead when you have such an infinitesimal
understanding that even you, as an overconfident human, do not perceive
one hell of an impressive understanding; well, that is suicide.
P(actual-understanding|impressed-with-own-understanding) may be low. But
p(impressed|actual-understanding) is damned high, when you consider how
impressive an actual understanding would be.
>> 6) And finally, of course, the probabilities are not independent!
>> If the best AI you can make isn't good enough, a million copies of it
>> don't have independent chances of success.
> This depends a great deal on the nature of the AI.
Not really. My rejoinder to this would sound a lot more impressive if we
agreed on more details in the theory of optimization processes, but in
English, my rejoinder reads:
If anything kills you it should be an unknown unknown. There is
absolutely no excuse for being killed by a known unknown internal to the AI.
Suppose you have a Russian roulette gun with a "randomized" (omitting
Jaynes's cogent objections to the notion of randomization; perhaps it was
quantum split) barrel containing one bullet. You put the gun to your head
and fire. If you were to split yourself into a thousand (local, not
Everett) copies, and play Russian roulette twenty times in succession,
around twenty-six of you would be expected to survive. But why are you
playing Russian roulette to begin with? If you are walking alone in the
desert, and you see a *premanufactured* gun that delivers a million
dollars into your lap on an empty barrel, or kills you on one chance out
of six, then - it depends on your utilities - you might decide to risk
your life on a play. But if you are *building* the gun, you have no
excuse. You should design the gun not to do *anything* if it comes up on
a loaded chamber. You have that privilege, the privilege of not killing
yourself, whenever you deal with a visible known unknown. You do not have
to build an AI which, depending on the value of the known unknown, is
either Friendly or unFriendly, because you can instead build an AI which
is either Friendly or stops. You can always stop, and in fact, whenever
you are *uncertain* you can halt temporarily, you don't need to wait for
it to become a catastrophe, and this is why I've switched from the concept
of "measure utilities and always pick the most desirable" to "pick the
most desirable action, unless the entropy in the decision system is too
high, in which case the optimization process itself suspends". You don't
need to shoot yourself; this is a basic precept of FAI development. You
are creating something, designing a process flow, and if you ever see a
branch of the process flow that depends on a known unknown and goes into
either "Friendly" or "unFriendly", you black out the part of the flowchart
that reads "unFriendly" and put in "process halts". One of the stunning
realizations I've had recently is that you can always do this.
There is no excuse for being killed by a known unknown. If you are going
to die with even the slightest dignity, so that your epitaph reads
"stupidity" rather than "suicide", you should be killed by an unknown
unknown, a basic design flaw, a misunderstanding of the underlying
philosophy. There is no excuse for being killed by a flowchart that does
what you thought it would. If you knew, you should have made a different
flowchart. Now why would a basic design flaw be an independent
probability for a million copies of the best AI you can make?
> If you're creating an AI that is a dynamical system, and you're
> building an initial condition and letting it evolve in a way that is
> driven partially by environmental influences, then if you run many
> copies of it independently
> a) of course, the dynamics of the different instances are not
> probabilistically independent
> b) nevertheless, they may be VERY far from identical, and may come to a
> wide variety of different outcomes
> My dreamed idea (which wasn't so serious, by the way!) did not rely
> upon the assumption of complete probabilistic independence between
> multiple evolving instances of the AI. It did rely upon the idea of a
> self-modifying AI as a pragmatically-nondeterministic,
> environmentally-coupled system rather than a strongly-deterministic
> system like an auto engine.
An auto engine contains much entropy and so is about as "nondeterministic"
as any other physical system of roughly the same temperature (depending on
molecular degrees of freedom, but, whatever). What makes an auto engine
"strongly deterministic", it seems, is that we understand how an auto
engine works and so it reliably performs the function that we wish of it,
which is all that we actually care about, the molecular degrees of freedom
dropping out of the equation as irrelevant to our goal processes. It's
not how much entropy is physically in the system, it's how much entropy we
care about, and how much entropy there is in that one all-important bit,
"success or failure"; what makes an auto engine "strongly deterministic",
is not being maintained at zero kelvin, but that, from a functional
perspective, it works reliably because it was designed with understanding.
>> What's wrong with this picture:
>> a) Confusing plausibility with frequency; b) Assigning something
>> called a "probability" in the absence of a theory powerful enough to
>> calculate it quantitatively; c) Treating highly correlated
>> probabilities as independent; d) Applying the Principle of
>> Indifference to surface outcomes rather than elementary
>> interchangeable events; and e) Attempting to trade off not knowing
>> how to solve a problem for confessing a "1%" probability of success.
> Sorry, I am definitely not guilty of errors a, c or d
> As for error b, I don't think it's bad to call an unknown quantity a
> probability just because I currently lack the evidence to calculate the
> value of the quantity. b is not an error.
I think it's okay to use the word "probability", as in, "this variable has
a probability but I don't know what it is", but not to use words like "20%
probability" if they are based on subjective impressions rather than
quantitative calculation or samples from a defined context. "20% sure"
would be better, but the number "20%" in this case measures not
probability but psychological support, and it cannot be manipulated
mathematically like a probability.
> As for e, there may be some problems for which there are no
> guaranteed-successful solutions, only solutions that have a reasonable
> probability of success. You seem highly certain that Friendly AI does
> not lie in this class, but you have given no evidence in favor of your
If the FAI runs on reliable hardware, there should be no legitimate source
of uncertainty from quantum branching or thermal vibrations, which are the
legitimate sources of irreducible randomness in physical processes.
Another alternative is that we are not dealing with irreducible randomness
but with confusion on the part of the designer, who is 20% sure that the
AI will work right, which is not at all the same as an AI with a 20%
probability of working. There is no excuse for being killed by a known
unknown internal to the AI.
I can think offhand of three classes of known unknows which are calculably
probabilistic but not accessible to the AI, black boxes with definite
contents, but I am not going to say what they are; if you want to make a
case for the Friendliness or unFriendliness of the AI depending on such a
variable you will have to make the case for yourself. It sounds too
exotic to me. The only reason the scenario *has any intuitive appeal*
(despite the conditions for fulfillment being so exotic) is that people
are 20% sure their system will work, so it seems plausible that the system
might have a 20% probability of working, and yet these are fundamentally
different and inconvertible quantities.
>> And if you're wondering why I'm so down on this, it's because it
>> seems to me like yet another excuse for not knowing how to build a
>> Friendly AI.
> Actually, I *do* know how to build a Friendly AI.... ;-)
Heh. This statement caused much excitement on the #sl4 IRC channel. I
had to explain that this meant that you now thought you knew how to build
an AI, and that the word "Friendly" was there for luck.
Personally, I do *not* yet know how to build a Friendly process. I rather
doubt that you could define Friendliness. Last time I checked you were
still claiming that you knew it to be impossible to resolve the basic
theoretical confusions without a chimp-level mind to experiment on.
Please don't use the word "Friendly" so lightly. There are plenty of
available alternatives, like "moral AI" or for that matter "friendly AI".
Part of the reason I went to the trouble of capitalizing it was so that
people who mean, i.e., "I want to build an AI and I hope it will be a nice
person" could say "I want to build a friendly AI". "I know how to build a
Friendly AI" is one hell of a strong claim over and above the weaker claim
that you merely know how to build a living mind from scratch.
> [I wasn't so sure a year ago, but recent simplifications in the
> Novamente design (removing some of the harder-to-control components and
> replacing them with probabilistic-inference-based alternatives) have
> made me more confident.]
> But I can't *guarantee* this AI will be Friendly no matter what; all I
> can give are reasonable intuitive arguments why it will be Friendly.
> The probability of the Friendliness outcome is not easy to calculate,
> as you've pointed out so loquaciously.
I am not asking for a guarantee. But what you call "intuitive argument"
seems to me philosophicalish thinking, driven by wishes and hopes, not
binding on Nature. Last time I checked, you said that you didn't think
the fundamental problems of FAI were solvable without a chimp-level mind
to experiment on (very clever excuse, that), so obviously you have not
already solved the fundamental problems. The standard to which I hold is
not a mathematical proof, but a technical argument - not an intuitive
philosophicalish argument, a technical argument - which includes technical
definitions of everything that you wish to achieve, and a walkthrough of
the processes showing that they produce behaviors which achieve those
definitions; the sort of technical argument that would be given by someone
who had solved all the foundational problems and taken the entire problem
out of the domain of philosophy. When I have that, I will then be willing
to say at last: "I do know how to build a Friendly AI." And perhaps Leon
Kass will not consider it proof, but at least I will no longer consider it
> And then I wonder whether that my "reasonable intuitive arguments" ----
> and *all* human arguments, whether we consider them rigorous or not;
> even our best mathematical proofs --- kinda fall apart and reveal
> limitations when we get into the domain of vastly transhuman
"Reasonable intuitive arguments" have been exhaustively demonstrated to
fall apart on guessing why Aunt Mabel is coughing, let alone in the domain
of vastly transhuman intelligence.
There is a huge spectrum between a mathematical proof and a "reasonable
intuitive argument". (I assume a "reasonable intuitive argument" applied
to AI is somewhere between a doctor's "informed clinical judgment" minus
the well-understood fundamentals and any past experience, and the wishful
thinking of Greek philosophers.) What is needed is something at the high
end of the reliability spectrum, a specific, purely technical definition
of all desirable properties of the outcome and a purely technical
walkthrough showing that the employed processes work to this end. Note
that I say "walkthrough showing that", not "reasons why" or "arguments
for". History shows that reasons why and arguments for are not binding on
Nature, but once a problem has been understood, an engineering walkthrough
> So I tend to apply (a mix of rigorous and intuitive)
> probabilistic thinking when thinking about transhuman AI's that are
> just a bit smarter than humans ... and then rely on ignorance-based
> Principle of Indifference type thinking, when thinking about transhuman
> AI's that are vastly, vastly smarter than any of us.
Is there even one example from the history of science where Nature has not
replied to this sort of argument with, "So what?" Einstein, maybe, but he
had math in which to phrase his intuitions.
Here is the lesson of my own experience in this area: Ignorance does not
work! If you do not know, you are screwed no matter how clever you are
about working around your ignorance. When you have fundamental problems
you must solve them. If you do not solve them it is like asking a Greek
philosopher to guess that the nature of matter is quantum physics, you're
just screwed. It doesn't matter how clever you are, you're just screwed.
I do not know everything about Friendly AI - there are N fundamental
problems of which I have only solved M - but whenever I solve a problem X,
it immediately becomes apparent that had I tried to build a Friendly AI
without first solving problem X, I would have just been screwed. Repeat
this experience a few times and you start to see that the only virtue of
ignorance is that, being ignorant of how to solve the problem, one is also
ignorant of the impossibility of success without knowledge.
I am speaking of "virtue" from a political standpoint, of course;
ignorance makes a very convenient argument, especially when speaking to an
audience that can be counted to nod along sympathetically when you confess
to not knowing. The humble confession of ignorance fuzzes out every
counterargument; why, you may be certain the Earth goes around the Sun,
good sir, but how can any of us really know anything? I am willing to
concede it is possible, perhaps, but who really knows? Your inexplicable
confidence in heliocentrism, dear sir, can only be the mark of
irrationality; reasonable folk are willing to admit when they do not know...
It is much easier to convince people that no one really knows, and hence,
why not hope? than to propound the implausible, arrogant, elitist
proposition that YOU have understood what others do not. And yet assuming
success and looking backward, anyone who builds a Friendly AI will need to
have understood one hell of a lot of stuff. One who humbly admits to a
lack of understanding may, perhaps, deserve our thanks for their honesty,
time and funding to think and learn and grow their understanding further;
but they will not build a Friendly AI, honest ignorance is just not good
enough for that.
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:43 MDT