[Spambayes] FYI: Java implementation
tim.one at comcast.net
Thu Jan 16 22:46:27 EST 2003
> I've been building a Java implementation of Paul Graham's
> "Bayesian" classification logic over the past couple months,
> intended as a plug-in filter for the Apache JAMES mail server.
Upgrade to Python and you would have finished a couple months ago <wink>.
> However, after considerable testing, tweaking and tuning via a
> proxy setup (similar to POPFile), plus some recent lurking on
> the Spambayes list, I'm now modifying this project to
> incorporate the excellent notions contributed by Gary Robinson,
> et al, as implemented in your Python code.
> Early results are *very* promising!!! This death2spam stuff is
> definitely heading in the right direction! I haven't quite
> finished the chi2 comparison logic, but even using just "gary-
> combining", the kinds of messages ending up in my "uncertain"
> category make much more sense.
chi-combining will give you more of the same. The combining methods are
related, in such a way that they're monotonic with each other. chi is more
extreme, and you'll find that it pushes most spam very close to 1.0, most
ham very close to 0.0, and highly ambiguous msgs very close to 0.5. This
gives it some nice properties for automated decision making (the cutoff
points for gary-combining were too touchy, across test sets, and across
time). But if you like a mode where you simply sort msgs by score, you can
stop with gary-combining and be happy.
> Plus I'm now seeing far less weirdness caused by Graham's
> "2 * nGood + nSpam >= 5" trick, etc. Will keep the list posted as to
> further progress.
The biases indeed had strange effects! It was quite a struggle to eliminate
all of them, in part because near the end of that struggle, some biases
acted to counteract others, so removing any one of them in isoolation made
things worse. Gary Robinson pushed us out of the pit by proposing to
eliminate all the remaining biases in one shot. I'm glad we were wise
enough to listen to him <wink>>
> I'd sure love to attend the upcoming spam-fest at MIT, but we
> moved downunder (Seattle -> Sydney) last year, and it's one
> helluva long way to go just for a day...
Meet up with Mark Hammond instead. He wrote the wondrous Outlook 2000
client for this project, and also sleeps upside down. Just don't try to
talk to him about Java. Our Anthony Baxter, who deserves more thanks at
least for his thankless work in maintaining the web site, is also on the
wrong side of the globe.
> Many thanks for all your fine coding, testing efforts, and
> thoughtful conversations! It's been very helpful, not to mention
> highly entertaining at times. ;-)
Less spam means more time for fun. Too bad I was kicked off the project
More information about the Spambayes