From: Ben Goertzel (ben@goertzel.org)
Date: Mon Sep 12 2005 - 07:46:50 MDT

Hi Jeff,

First, a META comment: I'm not sure that this discussion of the philosophy
and algebra of uncertain inference is really within the purview of the SL4
list, is it? It's a bit of a digression, though it is indirectly relevant
as it's pertinent to the foundations of AI engineering and rational
thinking...

Having said that, I will now continue to make my point until one of the
powers that be requests me to stop ;-)

> Ben replied:
> >> Suppose you have a population of 10 birds of different colors,
> >> and no other knowledge about the population.
> >>
> >> If you sample one of the birds and find that it's a purple goose,
> >> why does this count as information that all the RAVENS in the
> >> population are black?"
>
> Because it is evidence that [all non-black objects are non-ravens]. If
> we know at least one raven exists, and sampling a non-black object
> produces a non-raven on each of N sampling events, then with
> increasing N comes increasing certainty that no non-black object is a
> raven.

I agree so far...

> And [no non-black object is a raven] is, of course, logically
> and conceptually equivalent to [all ravens are black], given the tiny
> extra assumption I left out earlier that at least one raven exists.

This is the controversial part.

To get from

NOT(black) ==> NOT(raven)

to

raven ==> black

requires a logical transformation that does not preserve "amount of
evidence", at least not according to PTL's theory of evidence. And when you
look at the algebra of evidence transformation that comes along with this
transformation, you find that in fact the amount of evidence about
raven==>black ensuing from NOT(black) ==> NOT(raven) comes out to zero...

The key here is that in PTL "amount of evidence" is tabulated separately
from probability and comes with its own parallel inference rules.

The probability of NOT(black) ==> NOT(raven) should be calculated as

P1 =

P( NOT(black) AND NOT(raven) )
------------------------------
P(NOT(black))

with total evidence

N1 =total # of non-black things observed

and positive evidence

N1+ = total # of non-black non-ravens observed

[I say "observed" because for the moment I'm only talking about direct
evaluation of truth values, not about inference. These numbers may be
modified via speculative inferences of course.]

whereas the probability of raven ==> black should be calculated as

P2 =

P(raven AND black)
__________________
P(raven)

with total evidence

N2 = total # of ravens observed

and positive evidence

N2+ = total # of black ravens observed

(note that these "evidence counts" are not probabilities. They may be
scaled into probabilities of course, via dividing by the size of the assumed
universe or as we call it in PTL, the assumed context)

Now, ignoring issues related to universal quantification for the sake of
making a simple point, what you're doing in your Bayesian perspective
basically reasoning such as

P2=
P(black|raven) =
1 - P( NOT(black) | raven ) =
1 - P(raven| NOT(black)) P(NOT(black)) / P(raven) =
1 - [1- P(NOT(raven)| NOT(black))] P(NOT(black)) / P(raven) =
1 - [1- P1] P(NOT(black)) / P(raven)

hence using P2 to derive P1.

However, the problem is that Bayes rule (used to obtain the fourth line
above from the third) does not preserve "amount of evidence" in the way that
you seem to be assuming.

Evidence about P(A|B) in general does not count as evidence about P(B|A).
One can derive an estimate of the probability of any one of these from an
estimate of the probability of the other. But the amount of evidence on
which this estimate is based needs to be separately calculated, and may
sometimes be zero.

In PTL, the evidence about P(A|B) is N_B, the number of B's observed by the
intelligent system in question

The evidence about P(B|A) is N_A, the number of A's observed...

While we can say

P(B|A) = P(A|B) P(B) / P(A) [Bayes rule]

the corresponding rule for evidence amounts is

N_A = N_B P(A|B)

which in this case means e.g.

N_raven = N_(non-black) P(raven|non-black)

So if we've observed 499 non-black entities and none of them are ravens
(they're all purple geese, or orange orgasmotrons, or whatever), then we
have

N_(non-black) = 499

and

P(raven|non-black) = 0

thus an inferred

N_raven = 0

So, the amount of evidence about P(non-black|raven) [or P(black|raven)]
obtained from P(non-raven|black) is zero.

Elegant, huh? Hempel's paradox disappears when you move to two-component
truth values and tabulate evidence separately from probability. It doesn't
just quasi-disappear like in standard Bayesian semantics, it *really*
disappears.

[Note: I derived the idea of two-component truth values from Pei Wang's NARS
framework, but NARS is nonprobabilistic. PTL is unique in its integration
of this sort of two-component truth value with probability theory. However,
Walley's interval theory of probabilities also uses two component truth
values of a different sort, and I'm not sure how the Hempel paradox comes
out in Walley's theory. Potentially, it could give the same sort of result
as PTL, which in Walley's terms would correspond to an interval truth value
spanning the whole interval [0,1], thus being completely uninformative. I
mention Walley's approach mostly because it's more "mainstream" than PTL,
although it's less comprehensive as an approach to inference for AI.]

-- Ben

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT