[Mailman-Developers] SpamBayes for gate_news - who can test?

skip at pobox.com skip at pobox.com
Wed Nov 26 23:57:05 CET 2008


I pushed a Mailman branch to Launchpad which hooks the gate_news cron job
into SpamBayes.  On python.org we do a fair amount of early reject/filter
stuff at the SMTP step.  One of those things is to run incoming mail through
SpamBayes (if it doesn't get rejected by an earlier step).

Unfortunately, messages coming from the comp.lang.python newsgroup destined
for python-list at python.org just waltz right in as pretty as you please
without so much as a sniff from our spam-trapping tools.  It seemed to me
that the correct place to sniff is in gate_news, so I implemented the
necessary bits there to run incoming mail through SpamBayes and hold any
messages which look like they are or might be spam.

I am, however, neither a Mailman developer nor a Usenet user.  Setting up
the necessary scaffolding (Mailman instance, Usenet news connection, etc)
would be much harder for me than the work I've done so far stitching
SpamBayes into gate_news.  I thought about adding a --dry-run flag to
gate_news then testing my version out in parallel with the active version
already running on mail.python.org but it seems that would be difficult to
do as well since it's pretty hard to completely disconnect it from the list
data.

So I come, hat in hand, looking for some brave Mailman developer who is
willing to test out my modified version of gate_news.  You can grab the
latest version from Launchpad:

    bzr pull lp:~smontanaro/mailman/SpamBayes

There is an associated doc repo with a few instructions for setting up the
SpamBayes stuff:

    bzr pull lp:~smontanaro/mailman-administrivia/SpamBayes

A sample spambayes.ini file lives in the cron directory alongside gate_news.
It's basically what I would use on mail.python.org if I had the necessary
savvy to do this myself.

If you have any questions I'd be happy to answer them.  I can help you get
SpamBayes installed if you've never done that before.  (It's quite
straightforward if you're familiar with the normal Python setup.py thing or
use setuptools.)  I can also provide ham and spam training sets from
mail.python.org so you can construct a useful database for SpamBayes to
score messages against.  (You could run with an empty training database but
that would just cause all messages to score as "unsure" and be held as
possible spam.)

-- 
Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/


More information about the Mailman-Developers mailing list