[Spambayes] Promoting Spambayes (was Re: FYI: Java
jm at jmason.org
Tue Jan 21 17:11:52 EST 2003
Neil Schemenauer said:
> Matt Sergeant wrote:
> > Mozilla and SpamAssassin both copy their bayesian code from spambayes
> > (including tokenisation ideas and combiners).
> I, for one, am extremely pleased to hear that. It would be a shame if
> people kept using Paul Graham's original algorithm after all the work
> that was put in improving Spambayes. Despite what was said at the spam
> conference, I think the algorithm is important.
BTW it's worth noting we didn't just "nab" the ideas ;) Instead I
reimplemented based on descriptions, running a cross-validation test each
time, and threw in a few tokenization ideas of our own. In most cases the
results indicated that SpamBayes' techniques are the most effective --
there were a few extras, like SpamAssassin tokenizing some headers that SB
doesn't (From etc.), and different S and X values, but for the most part
they're effectively the same.
The nice thing is that it means those techniques have been independently
verified by 2 parties -- in other words, a scientific process ;)
More information about the Spambayes