**From:** Eliezer S. Yudkowsky (*sentience@pobox.com*)

**Date:** Wed Mar 09 2005 - 16:40:30 MST

**Next message:**Ben Goertzel: "RE: Cognitive neuroscience of consciousness"**Previous message:**brannen: "Re: Cognitive neuroscience of consciousness"**Maybe in reply to:**Eliezer S. Yudkowsky: "Re: Overconfidence and meta-rationality"**Next in thread:**Marc Geddes: "Re: Overconfidence and meta-rationality"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Robin Hanson wrote:

*>
*

*> You don't seem very interested in the formal analysis here. You know,
*

*> math, theorems and all that.
*

You did not ask.

Helpful references:

Robin's paper "Are Disagreements Honest?"

http://www.gmu.edu/jbc/Tyler/deceive.pdf

which builds on Aumann's Agreement Theorem:

http://www.princeton.edu/~bayesway/Dick.tex.pdf

(not a good intro for the bewildered, maybe someone can find a better intro)

*> The whole point of such analysis is to
*

*> identify which assumptions matter for what conclusions. And as far as I
*

*> can tell your only argument which gets at the heart of the relevant
*

*> assumptions is your claim that those who make relatively more errors
*

*> can't see this fact while those who make relatively fewer errors can see
*

*> this fact.
*

I don't think this argument (which you do concede for a factual premise?

or was our agreement only that people who make relatively fewer errors

do so in part because they are relatively better at estimating their

probability of error on specific problems?) is what touches on the

assumptions.

Anyway, let's talk math.

First, a couple of general principles that apply to discussions in which

someone invokes math:

1) An argument from pure math, if it turns out to be wrong, must have

an error in one or more premises or purportedly deductive steps. If the

deductive steps are all correct, this is a special kind of rigor which

Ben Goertzel gave as his definition of the word "technical"; personally

I would label this class of argument "logical", reserving "technical"

for hypotheses that sharply concentrate their probability mass. (A la

"A Technical Explanation of Technical Explanation".)

Pure math is a fragile thing. An argument that is pure math except for

one nonmathematical step is not pure math. The chain of reasoning in

"Are Disagreements Honest?" is not pure math. The modesty argument uses

Aumann's Agreement Theorem and AAT's extensions as plugins, but the

modesty argument itself is not formal from start to finish. I know of

no *formal* extension of Aumann's Agreement Theorem such that its

premises are plausibly applicable to humans. I also expect that I know

less than a hundredth as much about AAT's extensions as you do. But if

I am correct that there is no formal human extension of AAT, you cannot

tell me: "If you claim the theorem is wrong, then it is your

responsibility to identify which of the deductive steps or empirical

premises is wrong." The modesty argument has not yet been formalized to

that level. It's still a modesty *argument* not a modesty *theorem*.

Might the modesty argument readily formalize to a modesty theorem with a

bit more work? Later I will argue that this seems unlikely because the

modesty argument has a different character from Aumann's Agreement Theorem.

2) Logical argument has no ability to coerce physics. There's a

variety of parables I tell to illustrate this point. Here's one parable:

Socrates raised the glass of hemlock to his lips. "Do you

suppose," asked one of the onlookers, "that even hemlock will not be

enough to kill so wise and good a man?"

"No," replied another bystander, a student of philosophy; "all men

are mortal, and Socrates is a man; and if a mortal drink hemlock, surely

he dies."

"Well," said the onlooker, "what if it happens that Socrates

*isn't* mortal?"

"Nonsense," replied the student, a little sharply; "all men are

mortal *by definition*; it is part of what we mean by the word 'man'.

All men are mortal, Socrates is a man, therefore Socrates is mortal. It

is not merely a guess, but a *logical certainty*."

"I suppose that's right..." said the onlooker. "Oh, look, Socrates

already drank the hemlock while we were talking."

"Yes, he should keel over any minute now," said the student.

And they waited, and they waited, and they waited...

"Socrates appears not to be mortal," said the onlooker.

"Then Socrates must not be a man," replied the student. "All men

are mortal, Socrates is not mortal, therefore Socrates is not a man.

And that is not merely a guess, but a *logical certainty*."

The moral of this parable is that if all "humans" are mortal by

definition, then I cannot know that Socrates is a "human" until after I

have observed that Socrates is mortal. If "humans" are defined as

mortal language-users with ten fingers, then it does no good at all -

under Aristotle's logic - to observe merely that Socrates speaks

excellent Greek and count five of his fingers on each hand. I cannot

state that Socrates is a member of the class "human" until I observe all

three properties of Socrates - language use, ten fingers, and mortality.

Whatever information I put into an Aristotelian definition, I get

exactly the same information back out - nothing more. If you want

actual cognitive categories instead of mere Aristotelian classes,

categories that permit your mind to classify objects into empirical

clusters and thereby guess observations you have not yet made, you have

to resort to induction, not deduction. Whatever is said to be true "by

definition" usually isn't; writing in dictionaries has no ability to

coerce physics. You cannot change the writing in a dictionary and get a

different outcome.

Another parable:

Once upon a time there was a court jester who dabbled in logic.

The jester gave the king two boxes: The first box inscribed "Either

this box contains an angry frog, or the box with a false inscription

contains gold, but not both." And the second box inscribed "Either this

box contains gold and the box with a false inscription contains an angry

frog, or this box contains an angry frog and the box with a true

inscription contains gold." And the jester said: "One box contains an

angry frog, the other box gold, and one and only one of the inscriptions

is true."

The king opened the wrong box, and was savaged by an angry frog.

"You see," the jester said, "let us hypothesize that the first

inscription is the true one. Then suppose the first box contains an

angry frog. Then the other box would contain gold and this would

contradict the first inscription which we hypothesized to be true. Now

suppose the first box contains gold. The other box would contain an

angry frog, which again contradicts the first inscription -"

The king ordered the jester thrown in the dungeons.

A day later, the jester was brought before the king in chains, and

shown two boxes. "One box contains a key," said the king, "to unlock

your chains, and if you find the key you are free. But the other box

contains a dagger for your heart if you fail." And the first box was

inscribed: "Either both inscriptions are true or both inscriptions are

false." And the second box was inscribed: "This box contains the key."

The jester reasoned thusly: "Suppose the first inscription is

true. Then the second inscription must also be true. Now suppose the

first inscription is false. Then again the second inscription must be

true. Therefore the second box contains the key, whether the first

inscription is true or false."

The jester opened the second box and found a dagger.

"How?!" cried the jester in horror, as he was dragged away. "It

isn't possible!"

"It is quite possible," replied the king. "I merely wrote those

inscriptions on two boxes, and then I put the dagger in the second one."

In "Are Disagreements Honest?" you say that people should not have one

standard in public and another standard in private; you say: "If people

mostly disagree because they systematically violate the rationality

standards that they profess, and hold up for others, then we will say

that their disagreements are dishonest." (I would disagree with your

terminology; they might be dishonest *or* they might be self-deceived.

Whether you think self-deception is a better excuse than dishonesty is

between yourself and your morality.) In any case, there is a moral and

social dimension to the words you use in "Are Disagreements Honest?"

You did in fact invoke moral forces to help justify some steps in your

chain of reasoning, even if you come back later and say that the steps

can stand on their own.

Now suppose that I am looking at two boxes, one with gold, and one with

an angry frog. I have pondered these two boxes as best I may, and those

signs and portents that are attached to boxes; and I believe that the

first box contains the gold, with 67% probability. And another person

comes before me and says: "I believe that the first box contains an

angry frog, with 99.9% probability." Now you may say to me that I

should not presume a priori that I am more rational than others; you may

say that most people are self-deceived about their relative immunity to

self-deception; you may say it would be logically inconsistent with my

publicly professed tenets if we agree to disagree; you may say that it

wouldn't be fair for me to insist that the other person change his

opinion if I'm not willing to change mine. So suppose that the two of

us agree to compromise on a 99% probability that the first box contains

an angry frog. But this is not just a social compromise; it is an

attempted statement about physical reality, determined by the modesty

argument. What if the first box, in defiance of our logic and

reasonableness, turns out to contain gold instead? Which premises of

the modesty argument would turn out to be the flawed ones? Which

premises would have failed to reflect underlying, physical, empirical

reality?

The heart of your argument in "Are Disagreements Honest?" is Aumann's

Agreement Theorem and the dozens of extensions that have been found for

it. But if Aumann's Agreement Theorem is wrong (goes wrong reliably in

the long run, not just failing 1 time out of 100 when the consensus

belief is 99% probability) then we can readily compare the premises of

AAT against the dynamics of the agents, their updating, their prior

knowledge, etc., and track down the mistaken assumption that caused AAT

(or the extension of AAT) to fail to match physical reality. In

contrast, it seems harder to identify what would have gone wrong,

probability-theoretically speaking, if I dutifully follow the modesty

argument, humbly update my beliefs until there is no longer any

disagreement between myself and the person standing next to me, and the

other person is also fair and tries to do the same, and lo and behold

our consensus beliefs turn out to be more poorly calibrated than my

original guesses.

Is this scenario a physical impossibility? Not obviously, though I'm

willing to hear you out if you think it is. Let's suppose that the

scenario is physically possible and that it occurs; then which of the

premises of the modesty argument do you think would have been

empirically wrong? Is my sense of fairness factually incorrect? Is the

other person's humility factually incorrect? Does the factually

mistaken premise lie in our dutiful attempt to avoid agreeing to

disagree because we know this implies a logical inconsistency? To me

this suggests that the modesty argument is not just *presently*

informal, but that it would be harder to formalize than one might wish.

There's another important difference between the modesty argument and

Aumann's Agreement Theorem. AAT has been excessively generalized; it's

easy to generalize and a new generalization is always worth a published

paper. You attribute the great number of extensions of AAT to the

following underlying reason: "His [Aumann's] results are robust because

they are based on the simple idea that when seeking to estimate the

truth, you should realize you might be wrong; others may well know

things that you do not."

I disagree; this is *not* what Aumann's results are based on.

Aumann's results are based on the underlying idea that if other entities

behave in a way understandable to you, then their observable behaviors

are relevant Bayesian evidence to you. This includes the behavior of

assigning probabilities according to understandable Bayesian cognition.

Suppose that A and B have a common prior probability for proposition X

of 10%. A sees a piece of evidence E1 and updates X's probability to

90%; B sees a piece of evidence E2 and updates X's probability to 1%.

Then A and B compare notes, exchanging no information except their

probability assignments. Aumann's Agreement Theorem easily permits us

to construct scenarios in which A and B's consensus probability goes to

0, 1, or any real number between. (Or rather, simple extensions of AAT

permit this; the version of AAT I saw is static, allowing only a single

question and answer.) Why? Because it may be that A's posterior

announcement, "90%", is sufficient to uniquely identify E1 as A's

observation, in that no other observed evidence would produce A's

statement "90%"; likewise with B and E2. The joint probability for

E1&E2 given X (or ~X) does not need to be the product of the

probabilities E1|X and E2|X (E1|~X, E2|~X). It might be that E1 and E2

are only ever seen together when X, or only ever seen together when ~X.

So A and B are *not* compromising between their previous positions;

their consensus probability assignment is *not* a linear weighting of

their previous assignments.

If you tried to devise an extension of Aumann's Agreement Theorem in

which A and B, e.g., deduce each other's likelihoods given their stated

posteriors and then combine likelihoods, you would be assuming that A

and B always see unrelated evidence - an assumption rather difficult to

extend to human domains of argument; no two minds could ever take the

same arguments into account. Our individual attempts to cut through to

the correct answer do not have the Markov property relative to one

another; different rationalists make correlated errors.

Under AAT, as A and B exchange information and become mutually aware of

knowledge, they concentrate their models into an ever-smaller set of

possible worlds. (I dislike possible-worlds semantics for various

reasons, but let that aside; the formalizations I've found of AAT are

based on possible-worlds semantics. Besides, I rather liked the way

that possible-worlds semantics avoids the infinite recursion problem in

"common knowledge".) If A and B's models are concentrating their

probability densities into ever-smaller volumes, why, they must be

learning something - they're reducing entropy, one might say, though

only metaphorically.

Now *contrast* this with the modesty argument, as its terms of human

intercourse are usually presented. I believe that the moon is made of

green cheese with 80% probability. Fred believes that the moon is made

of blueberries with 90% probability. This is all the information that

we have of each other; we can exchange naked probability assignments but

no other arguments. By the math of AAT, *or* the intuitive terms of the

modesty argument, this ought to force agreement. In human terms,

presumably I should take into account that I might be wrong and that

Fred has also done some thinking about the subject, and compromise my

beliefs with Fred's, so that we'll say, oh, hm, that the moon is made of

green cheese with 40% probability and blueberries with 45% probability,

that sounds about right. Fred chews this over, decides I'm being fair,

and nods agreement; Fred updates his verbally stated probability

assignments accordingly. Yay! We agreed! It is now theoretically

possible that we are being verbally consistent with our professed

beliefs about what is rational!

But wait! What do Fred and I know about the moon that we didn't know

before? If this were AAT, rather than a human conversation, then as

Fred and I exchanged probability assignments our actual knowledge of the

moon would steadily increase; our models would concentrate into an

ever-smaller set of possible worlds. So in this sense the dynamics of

the modesty argument are most unlike the dynamics of Aumann's Agreement

Theorem, from which the modesty argument seeks to derive its force. AAT

drives down entropy (sorta); the modesty argument doesn't. This is a

BIG difference.

Furthermore, Fred and I can achieve the same mutual triumph of possible

consistency - hence, public defensibility if someone tries to criticize

us - by agreeing that the moon is equally likely to be made of green

cheese or blueberries. (Fred is willing to agree that I shouldn't be

penalized for having been more modest about my discrimination

capability. Modesty is a virtue and shouldn't be penalized.)

As far as any outside observer can tell according to the rules you have

laid down for 'modesty', two disputants can publicly satisfy the moral

demand of the modesty argument by any number of possible compromises.

From _Are Disagreements Honest_: "It is perhaps unsurprising that most

people do not always spend the effort required to completely overcome

known biases. What may be more surprising is that people do not simply

stop disagreeing, as this would seem to take relatively little

effort..." I haven't heard of an extension to AAT which (a) proves that

'rational' agents will agree (b) explicitly permits multiple possible

compromises to be equally 'rational' as the agent dynamics were defined.

From _Are Disagreements Honest?_:

*> One approach would be to try to never assume that you are more meta-rational than anyone else. But this cannot mean that you should agree with everyone, because you simply cannot do so when other people disagree among themselves. Alternatively, you could adopt a "middle" opinion. There are, however, many ways to define middle, and people can disagree about which middle is best (Barns 1998). Not only are there disagreements on many topics, but there are also disagreements on how to best correct for one’s limited meta-rationality.
*

The AATs I know are constructive; they don't just prove that agents will

agree as they acquire common knowledge, they describe *exactly how*

agents arrive at agreement. (Including multiple agents.) So that's

another sense in which the modesty argument seems unlike a formalizable

extension of AAT - the modesty argument doesn't tell us *how* to go

about being modest. Again, this is a BIG difference.

From _Are Disagreements Honest?_:

*> For example, people who feel free to criticize consistently complain when they notice someone making a sequence of statements that is inconsistent or incoherent. [...] These patterns of criticism suggest that people uphold rationality standards that prefer logical consistency...
*

As I wrote in an unpublished work of mine:

"Is the Way to have beliefs that are consistent among themselves? This

is not the Way, though it is often mistaken for the Way by logicians and

philosophers. The object of the Way is to achieve a map that reflects

the territory. If I survey a city block five times and draw five

accurate maps, the maps, being consistent with the same territory, will

be consistent with each other. Yet I must still walk through the city

block and draw lines on paper that correspond to what I see. If I sit

in my living room and draw five maps that are mutually consistent, the

maps will bear no relation whatsoever to the territory. Accuracy of

belief implies consistency of belief, but consistency does not imply

accuracy. Consistency of belief is only a sign of truth, and does not

constitute truth in itself."

From _ADH?_:

*> In this paper we consider only truth-seeking at the individual level, and do not attempt a formal definition, in the hope of avoiding the murky philosophical waters of “justified belief.”
*

I define the "truth" of a probabilistic belief system as its score

according to the strictly proper Bayesian scoring criterion I laid down

in "Technical Explanation" - a definition of truth which I should

probably be attributing to someone else, but I have no idea who.

(Incidentally, it seems to me that the notion of the Bayesian score cuts

through a lot of gibberish about freedom of priors; the external

goodness of a prior is its Bayesian score. A lot of philosophers seem

to think that, because there's disagreement where priors come from, they

can pick any damn prior they please and none of those darned

rationalists will be able to criticize them. But there's actually a

very clearly defined criterion for the external goodness of priors, the

question is just how to maximize it using internally accessible

decisions. That aside...)

According to one who follows the way of Bayesianity - a Bayesianitarian,

one might say - it is better to have inconsistent beliefs with a high

Bayesian score than to have consistent beliefs with a low Bayesian

score. Accuracy is prized above consistency. I guess that this

situation can never arise given logical omniscience or infinite

computing power; but I guess it can legitimately arise under bounded

rationality. Maybe you could even detect an *explicit* inconsistency in

your beliefs, while simultaneously having no way to reconcile it in a

way that you expect to raise your Bayesian score. I'm not sure about

that, though. It seems like the scenario would be hard to construct, no

matter what bounds you put on the rationalist. I would not be taken

aback to see a proof of impossibility - though I would hope the

impossibility proof to take the form of a simple constructive algorithm

that can be followed by most plausible bounded rationalists in case they

discover inconsistency.

Even the simplest inconsistency resolution algorithm may take more

time/computation than the simpler algorithm "discard one belief at

random". And the simplest good resolution algorithm for resolving a

human disagreement may take more time than one of the parties discarding

their beliefs at random. Would it be more rational to ignore this

matter of the Bayesian score, which is to say, ignore the truth, and

just agree as swiftly as possible with the other person? No. Would

that behavior be more 'consistent' with Aumann's result and extensions?

No, because the AATs I know, when applied to any specific

conversation, constructively specify a precise, score-maximizing change

of beliefs - which a random compromise is not. All you'd be maximizing

through rapid compromise is your immunity to social criticism for

'irrationality' in the event of a public disagreement.

Aumann's Agreement Theorem and its extensions do not say that

rationalists *should* agree. AATs prove that various rational agents

*will* agree, not because they *want* to agree, but because that's how

the dynamics work out. But that mathematical result doesn't mean that

you can become more rational by pursuing agreement. It doesn't mean you

can find your Way by trying to imitate this surface quality of AAT

agents, that they agree with one another; because that cognitive

behavior is itself quite unlike what AAT agents do. You cannot tack an

imperative toward agreement onto the Way. The Way is only the Way of

cutting through to the correct answer, not the Way of cutting through to

the correct answer + not disagreeing with others. If agreement arises

from that, fine; if not, it doesn't mean that you can patch the Way by

tacking a requirement for agreement onto the Way.

The essence of the modesty argument is that we can become more rational

by *trying* to agree with one another; but that is not how AAT agents

work in their internals. Though my reply doesn't rule out the

possibility that the modesty rule might prove pragmatically useful when

real human beings try to use it.

The modesty argument is important in one respect. I agree that when two

humans disagree and have common knowledge of each other's opinion (or a

human approximation of common knowledge which does not require logical

omniscience), *at least one* human must be doing something wrong. The

modesty argument doesn't tell us immediately what is wrong or how to fix

it. I have argued that the *behavior* of modesty is not a solution

theorem, though it might *pragmatically* help. But the modesty

*argument* does tell us that something is wrong. We shouldn't ignore

things when they are visibly wrong - even if modesty is not a solution.

One possible underlying fact of the matter might be that one person is

right and the other person is wrong and that is all there ever was to

it. This is not an uncommon state of human affairs. It happens every

time a scientific illiterate argues with a scientific literate about

natural selection. From my perspective, the scientific literate is

doing just fine and doesn't need to change anything. The scientific

illiterate, if he ever becomes capable of facing the truth, will end up

needing to sacrifice some of his most deeply held beliefs while not

receiving any compromise or sacrifice-of-belief in return, not even the

smallest consolation prize. That's just the Way things are sometimes.

And in AAT also, sometimes when you learn the other's answer you will

simply discard your own, while the other changes his probability

assignment not a jot. Aumann agents aren't always humble and compromising.

But then we come to the part of the problem that pits meta-rationality

against self-deception. How does the scientific literate guess that he

is in the right, when he, being scientifically literate, is also aware

of studies of human overconfidence and of consistent biases toward

self-overestimation of relative competence?

As far as I know, neither meta-rationality nor self-deception have been

*formalized* in a way plausibly applicable to humans even as an

approximation. (Or maybe it would be better to say that I have not yet

encountered a satisfactory formalism. For who among us has read the

entire Literature?)

Trying to estimate your own rationality or meta-rationality involves

severe theoretical problems because of the invocation of reflectivity, a

puzzle that I'm still trying to solve in my own FAI work. My puzzle

appears, not as a puzzle of estimating *self*-rationality as such, but

the puzzle of why a Bayesian attaches confidence to a purely abstract

system that performs Bayesian reasoning, without knowing the specifics

of the domain. "Beliefs" and "likelihoods" and "Bayesian justification"

and even "subjective probability" are not ontological parts of our

universe, which contains only a mist of probability amplitudes. The

probability theory I know can only apply to "beliefs" by translating

them into ordinary causal signals about the domain, not treating them

sympathetically *as beliefs*.

Suppose I assign a subjective probability of 40% to some one-time event,

and someone else says he assigns a subjective probability of 80% to the

same one-time event. This is all I know of him; I don't know the other

person's priors, nor what evidence he has seen, nor the likelihood

ratio. There is no fundamental mathematical contradiction between two

well-calibrated individuals with different evidence assigning different

subjective probabilities to the same one-time event. We can still

suppose both individuals are calibrated in the long run - when one says

"40%" it happens 40% of the time, and when one says "80%" it happens 80%

of the time. In this specific case, either the one-time event will

happen or it won't. How are two well-calibrated systems to update when

they know the other's estimate, assuming they each believe the other to

be well-calibrated, but know nothing else about one another?

Specifically, they don't know the other's priors, just that those priors

are well-calibrated - they can't deduce likelihood of evidence seen by

examining the posterior probability. (If they could deduce likelihoods,

they could translate beliefs to causal signals by translating: "His

prior odds in P were 1:4, and his posterior odds in P are 4:1, so he

must have seen evidence about P of likelihood 16:1" to "The fact of his

saying aloud '80%' has a likelihood ratio of 16:1 with respect to P/~P,

even though I don't know the conditional probabilities.")

How are these two minds to integrate the other's subjective probability

into their calculations, if they can't convert the other's spoken words

into some kind of witnessable causal signal that bears a known

evidential relationship to the actual phenomenon? How can Bayesian

reasoning take into account other agents' beliefs *as beliefs*, not just

as causal phenomena?

Maybe if you know the purely abstract fact that the other entity is a

Bayesian reasoner (implements a causal process with a certain Bayesian

structure), this causes some type of Bayesian evidence to be inferrable

from the pure abstract report "70%"? Well, first of all, how do you

integrate it? If there's a mathematical solution it ought to be

constructive. Second, attaching this kind of *abstract* confidence to

the output of a cognitive system runs into formal problems. Consider

Lob's Theorem in mathematical logic. Lob's Theorem says that if you can

prove that a proof of T implies T, you can prove T; |- ([]T => T)

implies |- T. Now the idea of attaching confidence to a Bayesian system

seems to me to translate into the idea that if a Bayesian system says

'X', that implies X. I'm still trying to sort out this confused issue

to the point where I will run over it in my mind one day and find out

that Lob is not actually a problem.

Is there an AAT extension that doesn't involve converting the other's

beliefs into causal signals with known evidentiary relationships to the

specific data? Is there a formal AAT extension that works on the

*abstract* knowledge of the other person's probable rationality, without

being able to relate specific beliefs to specific states of the world?

Suppose that I say 30%, and my friend says 70%, and we know of each

other only the pure abstract fact that we are calibrated in the long

run; in fact, we don't even know what our argument is about

specifically. Should we be able to reach an agreement on our

probability assignments even though we have no idea what we're arguing

about? How? What's the exact number?

That's the problem I run into when I try to formalize a pure abstract

belief about another person's 'rationality'. (If this has already been

formalized, do please let me know.) Now obviously human beings do make

intuitive estimates of each other's rationality. I'm just saying that I

don't know how to formalize this in a way free from paradox - humans do

a lot of thinking that is useful and powerful but also sloppy and

subject to paradox. I think that if this human thinking is reliably

useful, then there must be some structure to it that explains the

usefulness, a structure that can be extracted and used in an FAI

architecture while leaving all the sloppiness and paradox behind. But I

have not yet figured out how to build a reflective cognitive system that

attaches equal evidential force to (a) its own estimates as they are

produced in the system or (b) a mental model of an abstract process that

is an accurate copy of itself, plus the abstract knowledge (without

knowing the specific evidence) that this Bayesian process arrived at the

same specific probability output. I want this condition so the

cognitive system is consistent under reflection; it attaches the same

force to its own thoughts whether they are processed as thoughts or as

causal signals. But how do I prevent a system like that from falling

prey to Lob's Theorem when it tries the same thing in mathematical

logic? That's something I'm presently pondering. I think there's

probably a straightforward solution, I just don't have it yet.

Then we come to self-deception. If it were not for self-deception,

meta-rationality would be much more straightforward. Grant some kind of

cognitive framework for estimating self-rationality and

other-rationality. There would be some set of signals standing in a

Bayesian relation to the quantities of "rationality", some signals

publicly accessible and some privately accessible. Each party would

honestly report their self-estimate of rationality (the public signals

being privately accessible as well), and this estimate would have no

privileged bias. Instead, though, we have self-deceptive phenomena such

as biased retrieval of signals favorable to self-rationality, and biased

non-retrieval of signals prejudicial to self-rationality.

It seems to me that you have sometimes argued that I should foreshorten

my chain of reasoning, saying, "But why argue and defend yourself, and

give yourself a chance to deceive yourself? Why not just accept the

modesty argument? Just stop fighting, dammit!" I am a human, and a

human is a system with known biases like selective retrieval of

favorable evidence. Each additional step in an inferential chain

introduces a new opportunity for the biases to enter. Therefore I

should grant greater credence to shorter chains of inference.

This again has a certain human plausibility, and it even seems as if it

might be formalizable.

*But*, trying to foreshorten our chains of inference contradicts the

character of ordinary probability theory.

E. T. Jaynes (who is dead but not forgotten), in _Probability Theory:

The Logic of Science_, Chapter 1, page 1.14, verse 1-23, speaking of a

'robot' programmed to carry out Bayesian reasoning:

1-23b: "The robot always takes into account all of the evidence it has

relevant to a question. It does not arbitrarily ignore some of the

information, basing its conclusions only on what remains. In other

words, the robot is completely non-ideological."

Jaynes quoted this dictum when he railed against ad-hoc devices of

orthodox statistics that would throw away relevant information. The

modesty argument argues that I should foreshorten my chain of reasoning,

*not* take into account everything I can retrieve as evidence, and stick

to modesty - without using my biased retrieval mechanisms to try and

recall evidence regarding my relative competence. Now this has a

pragmatic human plausibility, but it's very un-Jaynesian. According to

the religion of Bayesianity, what might perhaps be called

Bayesianitarianism, I should be trying to kiss the truth, pressing my

map as close to the territory as possible, maximizing my Bayesian score

by every inch and fraction I can muster, using every bit of evidence I

can find.

I think that's the point which, from my perspective, cuts closest to the

heart of the matter. Biases can be overcome. You can fight bias, and

win. You can't do that if you cut short the chain of reasoning at its

beginning. I don't spend as much time as I once did thinking about my

relative rationality, mostly because I estimate myself as being so way

the hell ahead that *relative* rationality is no longer interesting.

The problems that worry me are whether I'm rational enough to deal with

a given challenge from Nature. But, yes, I try to estimate my

rationality in detail, instead of using unchanged my mean estimate for

the rationality of an average human. And maybe an average person who

tries to do that will fail pathetically. Doesn't mean *I'll* fail, cuz,

let's face it, I'm a better-than-average rationalist. There will be

costs, if I dare to estimate my own rationality. There will be errors.

But I think I can do better by thinking.

While you might think that I'm not as good as I think, you probably do

think that I'm a more skilled rationalist than an average early

21st-century human, right? According to the foreshortening version of

the modesty argument, would I be forbidden to notice even that? Where

do I draw the line? If you, Robin Hanson, go about saying that you have

no way of knowing that you know more about rationality than a typical

undergraduate philosophy student because you *might* be deceiving

yourself, then you have argued yourself into believing the patently

ridiculous, making your estimate correct.

The indexical argument about how you could counterfactually have been

born as someone else gets into deep anthropic issues, but I don't think

that's really relevant given the arguments I already stated.

And now I'd better terminate this letter before it goes over 40K and

mailing lists start rejecting it. I think that was most of what I had

to say about the math, leaving out the anthropic stuff for lack of space.

-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence

**Next message:**Ben Goertzel: "RE: Cognitive neuroscience of consciousness"**Previous message:**brannen: "Re: Cognitive neuroscience of consciousness"**Maybe in reply to:**Eliezer S. Yudkowsky: "Re: Overconfidence and meta-rationality"**Next in thread:**Marc Geddes: "Re: Overconfidence and meta-rationality"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

*
This archive was generated by hypermail 2.1.5
: Wed Jul 17 2013 - 04:00:50 MDT
*