artifical intelligence

Alex Martelli aleax at
Wed Sep 3 09:52:49 CEST 2003

ajsiegel at wrote:

>  >Arthur wrote:
>  ...
>  >> *Our* intelligence seems to give us a read as to where on the bell
>  >> curve a particular event may lie, or a least some sense of when we are
>  >> at an
>>Wrong: human beings are *eager* pattern-matching devices, extremely prone
>>to detect "patterns" that just don't exist in statistically significant
>>ways. There's quite a substantial body of literature, by now, on the
>>general issue of frequent fallacies on reasoning about probabilities.
> I can accept a "poorly expressed".  Not sure I can sign onto a "wrong".

The single line of text following this one is one of the longest I've
ever seen posted to Usenet - my compliments.  Not sure why KDE's KNode
showed it to me as a single line (with a left-right scrollbar to let
me eventually view it all) but managed to fold it for reply purposes!-).

> actual occurrrence of an "unlikely" occurrence.  That sense that something
> unlikely has occurred is not wrong.  And it is hard to put a finger on
> everything that goes into coming to such a conclusion. And therefore, I
> would presume, difficult to program.

Actually, I stick with our line from back in the '80s, when I was doing
speech recognition with IBM Research on a strictly probabilistic basis:
what we had on our T-shirts was

    P(A|B) = P(B|A)P(A)/P(B)

and you know, there IS really nothing more to it than this formula from
1764... almost;-).  And, it IS easy to program, if programmers were in
fact humble enough to study and apply statistics and probability rather
than looking for "artificial intelligence" silver bullets!-)

One thing you do have to estimate heuristically, in order to be able
to apply Bayes' theorem to many cases of practical use, is the probability
at any time (given an existing body of observations) that the next thing
(combination of features) you're going to observe is going to be one
you never observed yet (as opposed to, one among the set you did
observe).  Turing formulated a good heuristic for that, and, I'm told,
that heuristic is widely used in biometrics (trying to determine
correlations between e.g. umpteen possible features of a butterfly --
long vs short legs, ditto antennae, coloring, wingshape details, ...).

I think my own heuristic (a bit more prudent/pessimistic than Turing's)
works even better (we did validate that in terms of prediction performance
of recognition systems using either heuristic but otherwise identical, and
I also have handwaving considerations to justify it).  Turing's heuristic
boils down to: number of different observations that were made ONCE, divided
by total number of observations.  So, if your observations so far have been:
    1 2 1 3 5 3 5 4 7 7 ...
having made 10 observations in total, of which two (items 2 and 5) were
observed only once, Turing's heuristic would predict a probability of 0.2
for the next observation being "a surprise" (one never seen before, i.e.
one not in the set {1,2,3,4,5,7}).  My heuristic boils down to: number
of _different_ observations, divided by total number of observations; so,
my heuristic would predict a probability of 0.6 for the next observation
being "a surprise" (6 different things observed in 10 observations).  The
difference in prediction is never as high in practical use cases (with
MANY observations having been made -- ten isn't "many":-), and of course
there's all sort of implicit hypotheses (e.g. practically-infinite
alphabet of possible observations compared to the number of actual
observations -- when each observation is a combination of many separate
features, combinatorial explosion basically guarantees that:-).  One
could no doubt adjust either heuristic to work better for cases where
these hypotheses may fail (e.g. after observing "4 3 1 2" both heristics
predict a suprise probability of 1.0, as if repetition of any of the
4 observations already made, once each, was impossible -- that is clearly
over-predicting surprises!-).


More information about the Python-list mailing list