[Spambayes] a question about bayes approach

Tim Peters tim.one at comcast.net
Sat Jul 26 19:39:08 EDT 2003


[Tony]
> Note also that spambayes is not, technically, Bayesian (as I
> understand it), but similar.

[Richie]
> I don't know squat about the maths, but I believe it *is* Bayesian,
> with pieces added on top.  When an early draft of my Linux Journal
> article questioned whether Spambayes was Bayesian, Gary Robinson
> replied: "it might be best to take out the thing about "whether it's
> even correct to refer to it as Bayesian" because f(w) is definitely
> Bayesian."

It's an irony only the math-heads can appreciate <wink>:  the Bayesian part
is the two lines of code in Classifier.probability() following the "Now do
Robinson's Bayesian adjustment." comment block:

        n = hamcount * spam2ham  +  spamcount * ham2spam
        prob = (StimesX + n * prob) / (S + n)

That's solidly Bayesian.  Nothing else in our code is even remotely
Bayesian.  The irony is that Paul Graham's article on Bayesian classifiers
didn't have anything Bayesian in *this* part of his scheme.

Calling spambayes Bayesian is technically accurate because of this part.
It's also highly misleading, because nobody familiar with other Bayesian
classifiers could guess that's what we mean -- they expect (and reasonably
so) something Bayesian in the way we combine probabilities.  But the
spambayes chi2_spamprob() isn't Bayesian at all.  Heh!  For that matter,
despite its name, it doesn't compute "a probability", either.

All in all, since "Bayesian" has become something of a recognized buzzword
among the spam-hating public, it's probably good that we've got "bayes" in
the name of our mostly non-Bayesian classifier <wink -- and it's also a
catchier name than spamfischerchisquared ...>.




More information about the Spambayes mailing list