RE: Hempel's Paradox

From: Ben Goertzel (
Date: Sat Sep 10 2005 - 17:42:40 MDT

Hi Eli,

> > The problem is that what is logically correct is that an observation of
> > a non-black non-raven should provide NO evidence toward the hypothesis
> > that all ravens are black.
> E.T. Jaynes would have exploded on that one. Jaynes is dead, so I guess
> I'll have to handle this one myself.
> Your statement is certainly not logically correct. I can easily
> generate situations in which observing a non-black non-raven can
> generate evidence favoring the hypothesis "All ravens are black" over
> its alternatives. For example, the set of objects includes 7 ravens and
> 1 non-raven.

Yes, an observation of a non-black raven TOGETHER WITH SPECIAL ASSUMPTIONS
ABOUT THE SITUATION can yield evidence toward the hypothesis that all
ravens are black.

But in the absence of explicitly stated special assumptions, an observation
of a non-black non-raven should provide NO evidence toward the hypothesis
that all ravens are black.

> More importantly, your statement is mathematical nonsense.

My statement was not mathematics, but conceptually, I believe it was
correct -- except for the omitted emendation
"in the absence of explicitly stated special assumptions."

> There is simply no such thing as evidence which favors a hypothesis.
> There is only evidence which favors a hypothesis over other hypotheses.
> This follows directly from the definition of "evidence" as a
> likelihood ratio. A ratio needs a numerator and a denominator.

I am not defining "evidence" in the same way that you are. According to
my semantics of evidence, it IS possible to have evidence which simply
favors an hypothesis.

"Evidence" is a philosophical term which can be formally grounded in many
different ways.

> Considering how many different ways standard probability theory has been
> proven to be the unique method of reasoning that obeys consistency
> axioms XYZ, it is a simpler hypothesis that you are wrong about the
> observation providing ZERO evidence.

I don't think probability theory is incorrect, and I think Cox's proof
and its successsors are quite impressive demonstrations of the general
utility of probability theory.

However, Cox's axioms are about single-number measures of uncertainty,
and don't say anything about the possible need for two or more numbers
to quantify uncertainty (e.g. the utility of using a second number
besides probability to quantify the amount of evidence).

They also don't tell you anything about how to integrate probability
theory with the logic of variables, how to deal with intension vs.
extension, etc. etc.

> > However, what frustrates me about the quote you cite, and your attitude,
> > is that you seem to be denying that probability theory as standardly
> > deployed is conceptually and logically erroneous in this case -- albeit
> > the magnitude of its error is generally small.
> Yes, Ben, I do deny that standard probability theory is "conceptually
> and logically erroneous" in this quite straightforward case. At this
> point you really should read Jaynes, who uses (correctly formulated and
> applied) probability theory to knock down one alleged "paradox" after
> another.

Heyyy -- I read Jaynes when you were in diapers, sonny boy ;_) ...

and then I reread him (and read his more recent stuff) about 5 years back.

> It is your own theory which is the approximation that makes small
> errors. You omit from consideration the small amounts of evidence
> provided by sampling a random object or a random non-black object and
> finding that it is not a nonblack raven.

Sampling a random object and finding it is not a nonblack raven provides
NO evidence that all ravens are black -- not unless one is operating in
a situation where special assumptions and conditions apply.

> Uh... didn't someone, I think Frank Ramsey but possibly Stalnaker, prove
> that it was *impossible* to transform P(B|A) into P(A -> B), using any
> connective -> that was true or false in any given possible world? Or am
> I misunderstanding what you're trying to do here?

It is impossible to make that transformation in a *certain and infallible*
way, but it can be done in an approximative way using second-order

This leads into some math that is discussed in our in-process book on
PTL (the Novamente-design book has now been sent to a publisher for
consideration; the PTL book will follow in 6 months or so....)

> Furthermore, you don't make it clear how Novamente decides p(
> is_black(x) | is_raven(x) ), which is the whole problem at hand.

Well, the simplest way is to count the percentage of the ravens that you
see that are black.

Of course, there are subtleties if ravens can be part black, but those
are simple to handle.

Then there is the role of inference in dealing with indirect evidence,
of course... which is too long a story for this email.

> If I
> randomly sample a small group and find a couple of black ravens, then,
> after this, when I repeatedly sample non-black objects from the small
> group and none of them are ravens, or even if I randomly sample objects
> and they are not nonblack ravens, then even without observing more black
> ravens, my p(black|raven) should go on increasing.

If you randomly sample a bunch of objects and they are purple microphones,
should NOT increase your estimate of P(black|raven) -- not unless you have
some special assumptions about the group from which all these objects
are drawn. [Or unless you're willing to do some speculative
inference and assume that because ravens and microphones are both Earthly
objects that share some properties, there is a nonzero chance they share
coloration as well.... In this vein, if you sample a bunch of objects and
they're purple geese, then this
could sensibly decrease your estimate of P(black|raven) by a significant
amount, since geese
are similar to ravens and so there is some justification to think that
they might have some similarity in coloration pattern.]

> > The definition of evidence in PTL makes clear that the only evidence
> > that counts for
> >
> > P( is_black(x) | is_raven(x) )
> >
> > is the set of x for which is_raven(x) has a nonzero truth value,
> > and therefore Hempel's paradox does not exist.
> It seems that PTL ignores relevant evidence, then.

No, it ignores IRRELEVANT evidence in "direct evaluation" of truth values.

In inference (e.g. guessing that because some geese are purple, and both
and ravens are birds, then some ravens may be purple too) PTL can use
evidence besides x for which is_raven(x) has a nonzero truth value. But
is explicitly recognized as speculative inference that may be erroneous....

> As for Hempel's paradox not existing, as far as I can tell, you haven't
> addressed it at all. Exactly the same definition would show that P(
> is_not_raven(x) | is_not_black(x) ) only uses as evidence the set of x
> for which is_not_black(x) has nonzero truth value.
> PTL's estimate for
> ForAll x { is_raven(x) ==> is_black(x) }
> ought to be identical to its estimate for
> ForAll x { is_not_black(x) ==> is_not_raven(x) }
> and if it's not, that's Hempel's Paradox square in the face.

Well, clearly




have different truth values, in general. So, based on a given finite body
evidence, the truth value estimate for

ForAll x, raven(x) ==> black(x)

will NOT be the same as the truth value estimate for

ForAll x, ~black(x) ==> ~raven(x)

But this doesn't mean that PTL succumbs to Hempel's Paradox, quite the

It means that in PTL, just because two predicates A and B are equivalent
according to crisp logic, this doesn't necessarily mean that evidence for A
counts as evidence for B and vice versa.

(Raven ==> black) is equivalent to (~black ==> ~raven) in crisp logic, but
PTL's probabilistic logic, these two need not have identical truth values,
because they are associated with different bodies of "direct evidence."

In philosophical terms, PTL rejects the "principle of equivalence" in its
full generality. To get Hempel's paradox via the principle of equivalence
need to infer

raven ==> black
black OR ~raven
~~black OR ~raven
~black ==> ~raven

The equivalence works in the case of crisp truth values, but it doesn't
work in the case of uncertain truth values, because

* for raven ==> black, only ravens count as part of the total evidence,

* for ~black ==> ~raven, only nonblack things count as part of the total

* for (black OR ~raven), the total evidence is the whole relevant universe
(or specifically, all entities whose blackness or ravenness is evaluable)

This has to do with the difference between term logic and predicate logic
... in term logic, uncertain truth values need not be preserved via
crisp-logical equivalence transformations. Rather, each logical equivalence
transformation has a specific uncertain-truth-value "inference rule"
associated with it.

Anyway, this dialogue is fun, but it's going to get way too long and
intractable because to substantiate everything I'm talking about here in
detail would require me to give a lot of PTL math, which is basically the
point of the in-process PTL book...

> >>I shall now demonstrate the folly of adulterating Bayes with
> lesser wares.
> >>
> >>Suppose that I know that, in a certain sample, there is at least one
> >>black raven, and at least one blue teapot, and some number of other
> >>ravens of unknown color. I now observe an item from the group that is
> >>produced by the following sampling method: Someone looks over the
> >>group, and if there are no non-black ravens, he tosses out a blue
> >>teapot. If there are non-black ravens, he tosses out a black raven.
> >>Now observing a black raven definitely shows that not all ravens
> >>are black.
> >>
> >>How would Novamente's "augmented" probability theory handle that case, I
> >>wonder?
> >
> > Given the constraints you've introduced, the only way Novamente has to
> > handle this problem is to use "higher-order inference", which means
> > to explicitly represent the definition of the problem in terms of
> > variables and quantifiers, in a manner similar to predicate logic.
> Standard Bayes doesn't need to resort to higher-order logic to solve
> this problem. It just says what our expectations are, given various
> hypotheses, same as in any other case.

True, this case may be dealt with more simply in the standard probabilistic
framework than in PTL. But it's still quite fast and easy for PTL, no
problem at all....

> > The difference is that, unlike standard predicate logic, Novamente has
> > formulas for managing uncertain truth values attached to quantified
> > logical formulae.
> >
> > I could write out the details of this example in Novamente formalism,
> > and may do so later as it's a moderately amusing exercise, but I don't
> > have time at the moment.
> Sounds overcomplicated... not good for efficiency.

Well, all the PTL truth value formulas are quite simple. Dealing with
logical formulae can be complicated or simple depending on what you're

The only inefficient part of PTL is inference control -- adaptively pruning
the "search tree" of possible inferences. This is supposed to be solved via
applying PTL to attention allocation and assignment of credit, in the
design, but we haven't tested this aspect yet.

We have tuned PTL and Novamente's knowledge representation to give compact
representations for human commonsense knowledge, and short inference trails
for human commonsense inferences. The idea is that problems that are easy
for ordinary humans should be easy for it.... How well we've succeeded
at this, time will tell ;-)

-- Ben

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT