[Spambayes] Promoting Spambayes (was Re: FYI: Java implementation)

Justin Mason jm at jmason.org
Tue Jan 21 17:11:52 EST 2003


Neil Schemenauer said:
> Matt Sergeant wrote:
> > Mozilla and SpamAssassin both copy their bayesian code from spambayes 
> > (including tokenisation ideas and combiners).
> 
> I, for one, am extremely pleased to hear that.  It would be a shame if
> people kept using Paul Graham's original algorithm after all the work
> that was put in improving Spambayes.  Despite what was said at the spam
> conference, I think the algorithm is important.

BTW it's worth noting we didn't just "nab" the ideas ;) Instead I
reimplemented based on descriptions, running a cross-validation test each
time, and threw in a few tokenization ideas of our own.  In most cases the
results indicated that SpamBayes' techniques are the most effective --
there were a few extras, like SpamAssassin tokenizing some headers that SB
doesn't (From etc.), and different S and X values, but for the most part
they're effectively the same.

The nice thing is that it means those techniques have been independently
verified by 2 parties -- in other words, a scientific process ;)

--j.



More information about the Spambayes mailing list