Re: Dispersing AIs throughout the universe

From: Eliezer S. Yudkowsky (
Date: Thu Jan 08 2004 - 19:55:08 MST

Ben Goertzel wrote:
> Eliezer,
> It might be significantly easier to engineer an AI with a 20% or 1%
> (say) chance of being Friendly, than to engineer one with a 99.99%
> chance of being Friendly. If this is the case, then the
> broad-physical-dispersal approach that I suggested makes sense.

1) I doubt that it is "significantly easier". To get a 1% chance you
must solve 99% of the problem, as 'twere. It is no different from trying
to build a machine with a 1% chance of being an internal combustion
engine, a program with a 1% chance of being a spreadsheet, or a document
with a 1% chance of being well-formed XML.

2) Ignoring (1), and supposing someone built an AI with a 1% real chance
of being Friendly, I exceedingly doubt its maker would have the skill to
calculate that as a quantitative probability. To go over a program
specification and correctly, quantitatively calculate the probability that
it meets a certain criterion requires knowing exactly what that criterion
is, which variables the process flow critically depends on, and how those
variables contribute to the final probability, and so on. To correctly
calculate that a poorly assembled program (one representing the limit of
its maker's skill) has a 1% chance of being Friendly - even to within an
order of magnitude! - requires a skill level considerably, no, enormously
higher than that required to build a program with a 99.99% chance of being
Friendly; you must have reduced the entire problem to math, know the exact
criterion of success, tracked the dependency of success on every one of
the variables, and be capable of performing this calculation as
quantitative math for programs poorly assembled. NASA successfully
designed, built, and launched multiple space shuttles on multiple
successful missions, but their attempt to calculate a quantitative
probability of mission failure was statistically laughable and
demonstrably incorrect. If someone were to build, at the limit of their
skill, a program with a 1% real chance of being Friendly, and an observer
correctly calculated this probability, the observer would have to be a god.

3) So we are not talking about a quantitative calculation that a program
will be Friendly, but rather an application of the Principle of
Indifference to surface outcomes. The maker just doesn't really know
whether the program will be Friendly or not, and so pulls a probability
out of his ass. This reminds me of the story about when Australia was
starting its national lottery, and the television crews interviewed a man
in the street, asking him what he thought his chances were of winning.
"50/50", he said, "either I win or I don't."

4) Extremely extensive research shows that "probabilities" which people
pull out of their asses (as opposed to being able to calculate them
quantitatively) are not calibrated, that is, they bear essentially no
relation to reality. People use the term "99% probable" as a sort of
emotional ejaculation indicating that they believe in something really
strongly. There is no relation to actual probabilities. On empirical
tests, the range of surprises for 98% confidence intervals (where the
person gives an upper limit of 99% confidence and a lower limit of 99%
confidence) ranges between 30% and 60%, and the most common number I have
seen is 45%. This incredible overconfidence and enormously poor
calibration gets worse as the difficulty of the problem increases. The
problem is, of course, is that "probabilities" that people pull out of
their asses are just not related to reality in any way, nor do most people
realize that probabilities need to be calibrated. For example, someone
who says that the chance of an AI undergoing a hard takeoff is "a million
to one" is implying that he could be asked a million questions of equal
difficulty and expect to get at most one of them wrong. In reality, if
you cannot do the calculation and get a quantitative probability, you DO
NOT KNOW the probability and that is all there is to it. Research also
shows that this is one of the "resistant delusions" - most people,
confronted with the research that shows that making up probabilities
doesn't work, go on making up probabilities; being told about the research
fails to have the emotional impact that would be needed to overcome the
fun of making up probabilities. This is why I am trying to nag everyone
in the transhumanist community into reading Tversky and Kahneman's
"Judgment under uncertainty."

5) Plausibility is not the same as frequency. If you evaluate the
evidence (hopefully Bayesian evidence) back and forth and wind up by
estimating a 95% probability that "2 + 2 = 4", it doesn't mean you think
"2 + 2 = 4" on 19 out of 20 occasions.

6) And finally, of course, the probabilities are not independent! If the
best AI you can make isn't good enough, a million copies of it don't have
independent chances of success.

So to sum up, what we have is something like this. A person - let us
suppose for the sake of tradition that it is an Australian man - buys a
lottery ticket. On the surface, the odds seem like they should be 50/50,
since either he wins or he doesn't. It seems, though, that a lot of
people think it's really unlikely that he'll win the lottery, and so he
concedes that the probability might be a little lower. Checking how
strongly he feels about it, he finds that he wants to believe but is
afraid of being proven wrong, a state of mind that he describes with the
phrase "20% probability", which allows a satisfying hope of winning, while
still being low enough to ward off most disappointment in the event of
failure. However, the person comes up with a clever plan. Suppose that
he puts the lottery ticket under a hat. Then, once he knows the winning
number, he'll pull the hat off the lottery ticket. If the ticket doesn't
win the first time, he'll put the hat back on, and pull it off again.
Without knowing the winning numbers (at the present time), he calculates
in advance that on each occasion his chance of seeing the winning number
under the hat is 20%. So if he does this just 20 times, his chance of
winning is 99%!

What's wrong with this picture:

a) Confusing plausibility with frequency;
b) Assigning something called a "probability" in the absence of a theory
powerful enough to calculate it quantitatively;
c) Treating highly correlated probabilities as independent;
d) Applying the Principle of Indifference to surface outcomes rather than
elementary interchangeable events; and
e) Attempting to trade off not knowing how to solve a problem for
confessing a "1%" probability of success.

And if you're wondering why I'm so down on this, it's because it seems to
me like yet another excuse for not knowing how to build a Friendly AI.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:43 MDT