Re: Overconfidence and meta-rationality

From: Eliezer S. Yudkowsky (
Date: Wed Mar 09 2005 - 16:40:30 MST

Robin Hanson wrote:
> You don't seem very interested in the formal analysis here. You know,
> math, theorems and all that.

You did not ask.

Helpful references:

Robin's paper "Are Disagreements Honest?"

which builds on Aumann's Agreement Theorem:
(not a good intro for the bewildered, maybe someone can find a better intro)

> The whole point of such analysis is to
> identify which assumptions matter for what conclusions. And as far as I
> can tell your only argument which gets at the heart of the relevant
> assumptions is your claim that those who make relatively more errors
> can't see this fact while those who make relatively fewer errors can see
> this fact.

I don't think this argument (which you do concede for a factual premise?
or was our agreement only that people who make relatively fewer errors
do so in part because they are relatively better at estimating their
probability of error on specific problems?) is what touches on the

Anyway, let's talk math.

First, a couple of general principles that apply to discussions in which
someone invokes math:

1) An argument from pure math, if it turns out to be wrong, must have
an error in one or more premises or purportedly deductive steps. If the
deductive steps are all correct, this is a special kind of rigor which
Ben Goertzel gave as his definition of the word "technical"; personally
I would label this class of argument "logical", reserving "technical"
for hypotheses that sharply concentrate their probability mass. (A la
"A Technical Explanation of Technical Explanation".)

Pure math is a fragile thing. An argument that is pure math except for
one nonmathematical step is not pure math. The chain of reasoning in
"Are Disagreements Honest?" is not pure math. The modesty argument uses
Aumann's Agreement Theorem and AAT's extensions as plugins, but the
modesty argument itself is not formal from start to finish. I know of
no *formal* extension of Aumann's Agreement Theorem such that its
premises are plausibly applicable to humans. I also expect that I know
less than a hundredth as much about AAT's extensions as you do. But if
I am correct that there is no formal human extension of AAT, you cannot
tell me: "If you claim the theorem is wrong, then it is your
responsibility to identify which of the deductive steps or empirical
premises is wrong." The modesty argument has not yet been formalized to
that level. It's still a modesty *argument* not a modesty *theorem*.

Might the modesty argument readily formalize to a modesty theorem with a
bit more work? Later I will argue that this seems unlikely because the
modesty argument has a different character from Aumann's Agreement Theorem.

2) Logical argument has no ability to coerce physics. There's a
variety of parables I tell to illustrate this point. Here's one parable:

     Socrates raised the glass of hemlock to his lips. "Do you
suppose," asked one of the onlookers, "that even hemlock will not be
enough to kill so wise and good a man?"
     "No," replied another bystander, a student of philosophy; "all men
are mortal, and Socrates is a man; and if a mortal drink hemlock, surely
he dies."
     "Well," said the onlooker, "what if it happens that Socrates
*isn't* mortal?"
     "Nonsense," replied the student, a little sharply; "all men are
mortal *by definition*; it is part of what we mean by the word 'man'.
All men are mortal, Socrates is a man, therefore Socrates is mortal. It
is not merely a guess, but a *logical certainty*."
     "I suppose that's right..." said the onlooker. "Oh, look, Socrates
already drank the hemlock while we were talking."
     "Yes, he should keel over any minute now," said the student.
     And they waited, and they waited, and they waited...
     "Socrates appears not to be mortal," said the onlooker.
     "Then Socrates must not be a man," replied the student. "All men
are mortal, Socrates is not mortal, therefore Socrates is not a man.
And that is not merely a guess, but a *logical certainty*."

The moral of this parable is that if all "humans" are mortal by
definition, then I cannot know that Socrates is a "human" until after I
have observed that Socrates is mortal. If "humans" are defined as
mortal language-users with ten fingers, then it does no good at all -
under Aristotle's logic - to observe merely that Socrates speaks
excellent Greek and count five of his fingers on each hand. I cannot
state that Socrates is a member of the class "human" until I observe all
three properties of Socrates - language use, ten fingers, and mortality.
  Whatever information I put into an Aristotelian definition, I get
exactly the same information back out - nothing more. If you want
actual cognitive categories instead of mere Aristotelian classes,
categories that permit your mind to classify objects into empirical
clusters and thereby guess observations you have not yet made, you have
to resort to induction, not deduction. Whatever is said to be true "by
definition" usually isn't; writing in dictionaries has no ability to
coerce physics. You cannot change the writing in a dictionary and get a
different outcome.

Another parable:

     Once upon a time there was a court jester who dabbled in logic.
The jester gave the king two boxes: The first box inscribed "Either
this box contains an angry frog, or the box with a false inscription
contains gold, but not both." And the second box inscribed "Either this
box contains gold and the box with a false inscription contains an angry
frog, or this box contains an angry frog and the box with a true
inscription contains gold." And the jester said: "One box contains an
angry frog, the other box gold, and one and only one of the inscriptions
is true."
     The king opened the wrong box, and was savaged by an angry frog.
     "You see," the jester said, "let us hypothesize that the first
inscription is the true one. Then suppose the first box contains an
angry frog. Then the other box would contain gold and this would
contradict the first inscription which we hypothesized to be true. Now
suppose the first box contains gold. The other box would contain an
angry frog, which again contradicts the first inscription -"
     The king ordered the jester thrown in the dungeons.
     A day later, the jester was brought before the king in chains, and
shown two boxes. "One box contains a key," said the king, "to unlock
your chains, and if you find the key you are free. But the other box
contains a dagger for your heart if you fail." And the first box was
inscribed: "Either both inscriptions are true or both inscriptions are
false." And the second box was inscribed: "This box contains the key."
     The jester reasoned thusly: "Suppose the first inscription is
true. Then the second inscription must also be true. Now suppose the
first inscription is false. Then again the second inscription must be
true. Therefore the second box contains the key, whether the first
inscription is true or false."
     The jester opened the second box and found a dagger.
     "How?!" cried the jester in horror, as he was dragged away. "It
isn't possible!"
     "It is quite possible," replied the king. "I merely wrote those
inscriptions on two boxes, and then I put the dagger in the second one."

In "Are Disagreements Honest?" you say that people should not have one
standard in public and another standard in private; you say: "If people
mostly disagree because they systematically violate the rationality
standards that they profess, and hold up for others, then we will say
that their disagreements are dishonest." (I would disagree with your
terminology; they might be dishonest *or* they might be self-deceived.
Whether you think self-deception is a better excuse than dishonesty is
between yourself and your morality.) In any case, there is a moral and
social dimension to the words you use in "Are Disagreements Honest?"
You did in fact invoke moral forces to help justify some steps in your
chain of reasoning, even if you come back later and say that the steps
can stand on their own.

Now suppose that I am looking at two boxes, one with gold, and one with
an angry frog. I have pondered these two boxes as best I may, and those
signs and portents that are attached to boxes; and I believe that the
first box contains the gold, with 67% probability. And another person
comes before me and says: "I believe that the first box contains an
angry frog, with 99.9% probability." Now you may say to me that I
should not presume a priori that I am more rational than others; you may
say that most people are self-deceived about their relative immunity to
self-deception; you may say it would be logically inconsistent with my
publicly professed tenets if we agree to disagree; you may say that it
wouldn't be fair for me to insist that the other person change his
opinion if I'm not willing to change mine. So suppose that the two of
us agree to compromise on a 99% probability that the first box contains
an angry frog. But this is not just a social compromise; it is an
attempted statement about physical reality, determined by the modesty
argument. What if the first box, in defiance of our logic and
reasonableness, turns out to contain gold instead? Which premises of
the modesty argument would turn out to be the flawed ones? Which
premises would have failed to reflect underlying, physical, empirical

The heart of your argument in "Are Disagreements Honest?" is Aumann's
Agreement Theorem and the dozens of extensions that have been found for
it. But if Aumann's Agreement Theorem is wrong (goes wrong reliably in
the long run, not just failing 1 time out of 100 when the consensus
belief is 99% probability) then we can readily compare the premises of
AAT against the dynamics of the agents, their updating, their prior
knowledge, etc., and track down the mistaken assumption that caused AAT
(or the extension of AAT) to fail to match physical reality. In
contrast, it seems harder to identify what would have gone wrong,
probability-theoretically speaking, if I dutifully follow the modesty
argument, humbly update my beliefs until there is no longer any
disagreement between myself and the person standing next to me, and the
other person is also fair and tries to do the same, and lo and behold
our consensus beliefs turn out to be more poorly calibrated than my
original guesses.

Is this scenario a physical impossibility? Not obviously, though I'm
willing to hear you out if you think it is. Let's suppose that the
scenario is physically possible and that it occurs; then which of the
premises of the modesty argument do you think would have been
empirically wrong? Is my sense of fairness factually incorrect? Is the
other person's humility factually incorrect? Does the factually
mistaken premise lie in our dutiful attempt to avoid agreeing to
disagree because we know this implies a logical inconsistency? To me
this suggests that the modesty argument is not just *presently*
informal, but that it would be harder to formalize than one might wish.

There's another important difference between the modesty argument and
Aumann's Agreement Theorem. AAT has been excessively generalized; it's
easy to generalize and a new generalization is always worth a published
paper. You attribute the great number of extensions of AAT to the
following underlying reason: "His [Aumann's] results are robust because
they are based on the simple idea that when seeking to estimate the
truth, you should realize you might be wrong; others may well know
things that you do not."

I disagree; this is *not* what Aumann's results are based on.

Aumann's results are based on the underlying idea that if other entities
behave in a way understandable to you, then their observable behaviors
are relevant Bayesian evidence to you. This includes the behavior of
assigning probabilities according to understandable Bayesian cognition.

Suppose that A and B have a common prior probability for proposition X
of 10%. A sees a piece of evidence E1 and updates X's probability to
90%; B sees a piece of evidence E2 and updates X's probability to 1%.
Then A and B compare notes, exchanging no information except their
probability assignments. Aumann's Agreement Theorem easily permits us
to construct scenarios in which A and B's consensus probability goes to
0, 1, or any real number between. (Or rather, simple extensions of AAT
permit this; the version of AAT I saw is static, allowing only a single
question and answer.) Why? Because it may be that A's posterior
announcement, "90%", is sufficient to uniquely identify E1 as A's
observation, in that no other observed evidence would produce A's
statement "90%"; likewise with B and E2. The joint probability for
E1&E2 given X (or ~X) does not need to be the product of the
probabilities E1|X and E2|X (E1|~X, E2|~X). It might be that E1 and E2
are only ever seen together when X, or only ever seen together when ~X.
  So A and B are *not* compromising between their previous positions;
their consensus probability assignment is *not* a linear weighting of
their previous assignments.

If you tried to devise an extension of Aumann's Agreement Theorem in
which A and B, e.g., deduce each other's likelihoods given their stated
posteriors and then combine likelihoods, you would be assuming that A
and B always see unrelated evidence - an assumption rather difficult to
extend to human domains of argument; no two minds could ever take the
same arguments into account. Our individual attempts to cut through to
the correct answer do not have the Markov property relative to one
another; different rationalists make correlated errors.

Under AAT, as A and B exchange information and become mutually aware of
knowledge, they concentrate their models into an ever-smaller set of
possible worlds. (I dislike possible-worlds semantics for various
reasons, but let that aside; the formalizations I've found of AAT are
based on possible-worlds semantics. Besides, I rather liked the way
that possible-worlds semantics avoids the infinite recursion problem in
"common knowledge".) If A and B's models are concentrating their
probability densities into ever-smaller volumes, why, they must be
learning something - they're reducing entropy, one might say, though
only metaphorically.

Now *contrast* this with the modesty argument, as its terms of human
intercourse are usually presented. I believe that the moon is made of
green cheese with 80% probability. Fred believes that the moon is made
of blueberries with 90% probability. This is all the information that
we have of each other; we can exchange naked probability assignments but
no other arguments. By the math of AAT, *or* the intuitive terms of the
modesty argument, this ought to force agreement. In human terms,
presumably I should take into account that I might be wrong and that
Fred has also done some thinking about the subject, and compromise my
beliefs with Fred's, so that we'll say, oh, hm, that the moon is made of
green cheese with 40% probability and blueberries with 45% probability,
that sounds about right. Fred chews this over, decides I'm being fair,
and nods agreement; Fred updates his verbally stated probability
assignments accordingly. Yay! We agreed! It is now theoretically
possible that we are being verbally consistent with our professed
beliefs about what is rational!

But wait! What do Fred and I know about the moon that we didn't know
before? If this were AAT, rather than a human conversation, then as
Fred and I exchanged probability assignments our actual knowledge of the
moon would steadily increase; our models would concentrate into an
ever-smaller set of possible worlds. So in this sense the dynamics of
the modesty argument are most unlike the dynamics of Aumann's Agreement
Theorem, from which the modesty argument seeks to derive its force. AAT
drives down entropy (sorta); the modesty argument doesn't. This is a
BIG difference.

Furthermore, Fred and I can achieve the same mutual triumph of possible
consistency - hence, public defensibility if someone tries to criticize
us - by agreeing that the moon is equally likely to be made of green
cheese or blueberries. (Fred is willing to agree that I shouldn't be
penalized for having been more modest about my discrimination
capability. Modesty is a virtue and shouldn't be penalized.)

As far as any outside observer can tell according to the rules you have
laid down for 'modesty', two disputants can publicly satisfy the moral
demand of the modesty argument by any number of possible compromises.
 From _Are Disagreements Honest_: "It is perhaps unsurprising that most
people do not always spend the effort required to completely overcome
known biases. What may be more surprising is that people do not simply
stop disagreeing, as this would seem to take relatively little
effort..." I haven't heard of an extension to AAT which (a) proves that
'rational' agents will agree (b) explicitly permits multiple possible
compromises to be equally 'rational' as the agent dynamics were defined.

 From _Are Disagreements Honest?_:

> One approach would be to try to never assume that you are more meta-rational than anyone else. But this cannot mean that you should agree with everyone, because you simply cannot do so when other people disagree among themselves. Alternatively, you could adopt a "middle" opinion. There are, however, many ways to define middle, and people can disagree about which middle is best (Barns 1998). Not only are there disagreements on many topics, but there are also disagreements on how to best correct for one’s limited meta-rationality.

The AATs I know are constructive; they don't just prove that agents will
agree as they acquire common knowledge, they describe *exactly how*
agents arrive at agreement. (Including multiple agents.) So that's
another sense in which the modesty argument seems unlike a formalizable
extension of AAT - the modesty argument doesn't tell us *how* to go
about being modest. Again, this is a BIG difference.

 From _Are Disagreements Honest?_:

> For example, people who feel free to criticize consistently complain when they notice someone making a sequence of statements that is inconsistent or incoherent. [...] These patterns of criticism suggest that people uphold rationality standards that prefer logical consistency...

As I wrote in an unpublished work of mine:

"Is the Way to have beliefs that are consistent among themselves? This
is not the Way, though it is often mistaken for the Way by logicians and
philosophers. The object of the Way is to achieve a map that reflects
the territory. If I survey a city block five times and draw five
accurate maps, the maps, being consistent with the same territory, will
be consistent with each other. Yet I must still walk through the city
block and draw lines on paper that correspond to what I see. If I sit
in my living room and draw five maps that are mutually consistent, the
maps will bear no relation whatsoever to the territory. Accuracy of
belief implies consistency of belief, but consistency does not imply
accuracy. Consistency of belief is only a sign of truth, and does not
constitute truth in itself."

 From _ADH?_:

> In this paper we consider only truth-seeking at the individual level, and do not attempt a formal definition, in the hope of avoiding the murky philosophical waters of “justified belief.”

I define the "truth" of a probabilistic belief system as its score
according to the strictly proper Bayesian scoring criterion I laid down
in "Technical Explanation" - a definition of truth which I should
probably be attributing to someone else, but I have no idea who.

(Incidentally, it seems to me that the notion of the Bayesian score cuts
through a lot of gibberish about freedom of priors; the external
goodness of a prior is its Bayesian score. A lot of philosophers seem
to think that, because there's disagreement where priors come from, they
can pick any damn prior they please and none of those darned
rationalists will be able to criticize them. But there's actually a
very clearly defined criterion for the external goodness of priors, the
question is just how to maximize it using internally accessible
decisions. That aside...)

According to one who follows the way of Bayesianity - a Bayesianitarian,
one might say - it is better to have inconsistent beliefs with a high
Bayesian score than to have consistent beliefs with a low Bayesian
score. Accuracy is prized above consistency. I guess that this
situation can never arise given logical omniscience or infinite
computing power; but I guess it can legitimately arise under bounded
rationality. Maybe you could even detect an *explicit* inconsistency in
your beliefs, while simultaneously having no way to reconcile it in a
way that you expect to raise your Bayesian score. I'm not sure about
that, though. It seems like the scenario would be hard to construct, no
matter what bounds you put on the rationalist. I would not be taken
aback to see a proof of impossibility - though I would hope the
impossibility proof to take the form of a simple constructive algorithm
that can be followed by most plausible bounded rationalists in case they
discover inconsistency.

Even the simplest inconsistency resolution algorithm may take more
time/computation than the simpler algorithm "discard one belief at
random". And the simplest good resolution algorithm for resolving a
human disagreement may take more time than one of the parties discarding
their beliefs at random. Would it be more rational to ignore this
matter of the Bayesian score, which is to say, ignore the truth, and
just agree as swiftly as possible with the other person? No. Would
that behavior be more 'consistent' with Aumann's result and extensions?
  No, because the AATs I know, when applied to any specific
conversation, constructively specify a precise, score-maximizing change
of beliefs - which a random compromise is not. All you'd be maximizing
through rapid compromise is your immunity to social criticism for
'irrationality' in the event of a public disagreement.

Aumann's Agreement Theorem and its extensions do not say that
rationalists *should* agree. AATs prove that various rational agents
*will* agree, not because they *want* to agree, but because that's how
the dynamics work out. But that mathematical result doesn't mean that
you can become more rational by pursuing agreement. It doesn't mean you
can find your Way by trying to imitate this surface quality of AAT
agents, that they agree with one another; because that cognitive
behavior is itself quite unlike what AAT agents do. You cannot tack an
imperative toward agreement onto the Way. The Way is only the Way of
cutting through to the correct answer, not the Way of cutting through to
the correct answer + not disagreeing with others. If agreement arises
from that, fine; if not, it doesn't mean that you can patch the Way by
tacking a requirement for agreement onto the Way.

The essence of the modesty argument is that we can become more rational
by *trying* to agree with one another; but that is not how AAT agents
work in their internals. Though my reply doesn't rule out the
possibility that the modesty rule might prove pragmatically useful when
real human beings try to use it.

The modesty argument is important in one respect. I agree that when two
humans disagree and have common knowledge of each other's opinion (or a
human approximation of common knowledge which does not require logical
omniscience), *at least one* human must be doing something wrong. The
modesty argument doesn't tell us immediately what is wrong or how to fix
it. I have argued that the *behavior* of modesty is not a solution
theorem, though it might *pragmatically* help. But the modesty
*argument* does tell us that something is wrong. We shouldn't ignore
things when they are visibly wrong - even if modesty is not a solution.

One possible underlying fact of the matter might be that one person is
right and the other person is wrong and that is all there ever was to
it. This is not an uncommon state of human affairs. It happens every
time a scientific illiterate argues with a scientific literate about
natural selection. From my perspective, the scientific literate is
doing just fine and doesn't need to change anything. The scientific
illiterate, if he ever becomes capable of facing the truth, will end up
needing to sacrifice some of his most deeply held beliefs while not
receiving any compromise or sacrifice-of-belief in return, not even the
smallest consolation prize. That's just the Way things are sometimes.
And in AAT also, sometimes when you learn the other's answer you will
simply discard your own, while the other changes his probability
assignment not a jot. Aumann agents aren't always humble and compromising.

But then we come to the part of the problem that pits meta-rationality
against self-deception. How does the scientific literate guess that he
is in the right, when he, being scientifically literate, is also aware
of studies of human overconfidence and of consistent biases toward
self-overestimation of relative competence?

As far as I know, neither meta-rationality nor self-deception have been
*formalized* in a way plausibly applicable to humans even as an
approximation. (Or maybe it would be better to say that I have not yet
encountered a satisfactory formalism. For who among us has read the
entire Literature?)

Trying to estimate your own rationality or meta-rationality involves
severe theoretical problems because of the invocation of reflectivity, a
puzzle that I'm still trying to solve in my own FAI work. My puzzle
appears, not as a puzzle of estimating *self*-rationality as such, but
the puzzle of why a Bayesian attaches confidence to a purely abstract
system that performs Bayesian reasoning, without knowing the specifics
of the domain. "Beliefs" and "likelihoods" and "Bayesian justification"
and even "subjective probability" are not ontological parts of our
universe, which contains only a mist of probability amplitudes. The
probability theory I know can only apply to "beliefs" by translating
them into ordinary causal signals about the domain, not treating them
sympathetically *as beliefs*.

Suppose I assign a subjective probability of 40% to some one-time event,
and someone else says he assigns a subjective probability of 80% to the
same one-time event. This is all I know of him; I don't know the other
person's priors, nor what evidence he has seen, nor the likelihood
ratio. There is no fundamental mathematical contradiction between two
well-calibrated individuals with different evidence assigning different
subjective probabilities to the same one-time event. We can still
suppose both individuals are calibrated in the long run - when one says
"40%" it happens 40% of the time, and when one says "80%" it happens 80%
of the time. In this specific case, either the one-time event will
happen or it won't. How are two well-calibrated systems to update when
they know the other's estimate, assuming they each believe the other to
be well-calibrated, but know nothing else about one another?
Specifically, they don't know the other's priors, just that those priors
are well-calibrated - they can't deduce likelihood of evidence seen by
examining the posterior probability. (If they could deduce likelihoods,
they could translate beliefs to causal signals by translating: "His
prior odds in P were 1:4, and his posterior odds in P are 4:1, so he
must have seen evidence about P of likelihood 16:1" to "The fact of his
saying aloud '80%' has a likelihood ratio of 16:1 with respect to P/~P,
even though I don't know the conditional probabilities.")

How are these two minds to integrate the other's subjective probability
into their calculations, if they can't convert the other's spoken words
into some kind of witnessable causal signal that bears a known
evidential relationship to the actual phenomenon? How can Bayesian
reasoning take into account other agents' beliefs *as beliefs*, not just
as causal phenomena?

Maybe if you know the purely abstract fact that the other entity is a
Bayesian reasoner (implements a causal process with a certain Bayesian
structure), this causes some type of Bayesian evidence to be inferrable
from the pure abstract report "70%"? Well, first of all, how do you
integrate it? If there's a mathematical solution it ought to be
constructive. Second, attaching this kind of *abstract* confidence to
the output of a cognitive system runs into formal problems. Consider
Lob's Theorem in mathematical logic. Lob's Theorem says that if you can
prove that a proof of T implies T, you can prove T; |- ([]T => T)
implies |- T. Now the idea of attaching confidence to a Bayesian system
seems to me to translate into the idea that if a Bayesian system says
'X', that implies X. I'm still trying to sort out this confused issue
to the point where I will run over it in my mind one day and find out
that Lob is not actually a problem.

Is there an AAT extension that doesn't involve converting the other's
beliefs into causal signals with known evidentiary relationships to the
specific data? Is there a formal AAT extension that works on the
*abstract* knowledge of the other person's probable rationality, without
being able to relate specific beliefs to specific states of the world?
Suppose that I say 30%, and my friend says 70%, and we know of each
other only the pure abstract fact that we are calibrated in the long
run; in fact, we don't even know what our argument is about
specifically. Should we be able to reach an agreement on our
probability assignments even though we have no idea what we're arguing
about? How? What's the exact number?

That's the problem I run into when I try to formalize a pure abstract
belief about another person's 'rationality'. (If this has already been
formalized, do please let me know.) Now obviously human beings do make
intuitive estimates of each other's rationality. I'm just saying that I
don't know how to formalize this in a way free from paradox - humans do
a lot of thinking that is useful and powerful but also sloppy and
subject to paradox. I think that if this human thinking is reliably
useful, then there must be some structure to it that explains the
usefulness, a structure that can be extracted and used in an FAI
architecture while leaving all the sloppiness and paradox behind. But I
have not yet figured out how to build a reflective cognitive system that
attaches equal evidential force to (a) its own estimates as they are
produced in the system or (b) a mental model of an abstract process that
is an accurate copy of itself, plus the abstract knowledge (without
knowing the specific evidence) that this Bayesian process arrived at the
same specific probability output. I want this condition so the
cognitive system is consistent under reflection; it attaches the same
force to its own thoughts whether they are processed as thoughts or as
causal signals. But how do I prevent a system like that from falling
prey to Lob's Theorem when it tries the same thing in mathematical
logic? That's something I'm presently pondering. I think there's
probably a straightforward solution, I just don't have it yet.

Then we come to self-deception. If it were not for self-deception,
meta-rationality would be much more straightforward. Grant some kind of
cognitive framework for estimating self-rationality and
other-rationality. There would be some set of signals standing in a
Bayesian relation to the quantities of "rationality", some signals
publicly accessible and some privately accessible. Each party would
honestly report their self-estimate of rationality (the public signals
being privately accessible as well), and this estimate would have no
privileged bias. Instead, though, we have self-deceptive phenomena such
as biased retrieval of signals favorable to self-rationality, and biased
non-retrieval of signals prejudicial to self-rationality.

It seems to me that you have sometimes argued that I should foreshorten
my chain of reasoning, saying, "But why argue and defend yourself, and
give yourself a chance to deceive yourself? Why not just accept the
modesty argument? Just stop fighting, dammit!" I am a human, and a
human is a system with known biases like selective retrieval of
favorable evidence. Each additional step in an inferential chain
introduces a new opportunity for the biases to enter. Therefore I
should grant greater credence to shorter chains of inference.

This again has a certain human plausibility, and it even seems as if it
might be formalizable.

*But*, trying to foreshorten our chains of inference contradicts the
character of ordinary probability theory.

E. T. Jaynes (who is dead but not forgotten), in _Probability Theory:
The Logic of Science_, Chapter 1, page 1.14, verse 1-23, speaking of a
'robot' programmed to carry out Bayesian reasoning:

1-23b: "The robot always takes into account all of the evidence it has
relevant to a question. It does not arbitrarily ignore some of the
information, basing its conclusions only on what remains. In other
words, the robot is completely non-ideological."

Jaynes quoted this dictum when he railed against ad-hoc devices of
orthodox statistics that would throw away relevant information. The
modesty argument argues that I should foreshorten my chain of reasoning,
*not* take into account everything I can retrieve as evidence, and stick
to modesty - without using my biased retrieval mechanisms to try and
recall evidence regarding my relative competence. Now this has a
pragmatic human plausibility, but it's very un-Jaynesian. According to
the religion of Bayesianity, what might perhaps be called
Bayesianitarianism, I should be trying to kiss the truth, pressing my
map as close to the territory as possible, maximizing my Bayesian score
by every inch and fraction I can muster, using every bit of evidence I
can find.

I think that's the point which, from my perspective, cuts closest to the
heart of the matter. Biases can be overcome. You can fight bias, and
win. You can't do that if you cut short the chain of reasoning at its
beginning. I don't spend as much time as I once did thinking about my
relative rationality, mostly because I estimate myself as being so way
the hell ahead that *relative* rationality is no longer interesting.
The problems that worry me are whether I'm rational enough to deal with
a given challenge from Nature. But, yes, I try to estimate my
rationality in detail, instead of using unchanged my mean estimate for
the rationality of an average human. And maybe an average person who
tries to do that will fail pathetically. Doesn't mean *I'll* fail, cuz,
let's face it, I'm a better-than-average rationalist. There will be
costs, if I dare to estimate my own rationality. There will be errors.
  But I think I can do better by thinking.

While you might think that I'm not as good as I think, you probably do
think that I'm a more skilled rationalist than an average early
21st-century human, right? According to the foreshortening version of
the modesty argument, would I be forbidden to notice even that? Where
do I draw the line? If you, Robin Hanson, go about saying that you have
no way of knowing that you know more about rationality than a typical
undergraduate philosophy student because you *might* be deceiving
yourself, then you have argued yourself into believing the patently
ridiculous, making your estimate correct.

The indexical argument about how you could counterfactually have been
born as someone else gets into deep anthropic issues, but I don't think
that's really relevant given the arguments I already stated.

And now I'd better terminate this letter before it goes over 40K and
mailing lists start rejecting it. I think that was most of what I had
to say about the math, leaving out the anthropic stuff for lack of space.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:50 MDT