SpamBayes for gate_news - who can test?
data:image/s3,"s3://crabby-images/cbbce/cbbced8c47f7bfb197ed1a768a6942977c050e7c" alt=""
I pushed a Mailman branch to Launchpad which hooks the gate_news cron job into SpamBayes. On python.org we do a fair amount of early reject/filter stuff at the SMTP step. One of those things is to run incoming mail through SpamBayes (if it doesn't get rejected by an earlier step).
Unfortunately, messages coming from the comp.lang.python newsgroup destined for python-list@python.org just waltz right in as pretty as you please without so much as a sniff from our spam-trapping tools. It seemed to me that the correct place to sniff is in gate_news, so I implemented the necessary bits there to run incoming mail through SpamBayes and hold any messages which look like they are or might be spam.
I am, however, neither a Mailman developer nor a Usenet user. Setting up the necessary scaffolding (Mailman instance, Usenet news connection, etc) would be much harder for me than the work I've done so far stitching SpamBayes into gate_news. I thought about adding a --dry-run flag to gate_news then testing my version out in parallel with the active version already running on mail.python.org but it seems that would be difficult to do as well since it's pretty hard to completely disconnect it from the list data.
So I come, hat in hand, looking for some brave Mailman developer who is willing to test out my modified version of gate_news. You can grab the latest version from Launchpad:
bzr pull lp:~smontanaro/mailman/SpamBayes
There is an associated doc repo with a few instructions for setting up the SpamBayes stuff:
bzr pull lp:~smontanaro/mailman-administrivia/SpamBayes
A sample spambayes.ini file lives in the cron directory alongside gate_news. It's basically what I would use on mail.python.org if I had the necessary savvy to do this myself.
If you have any questions I'd be happy to answer them. I can help you get SpamBayes installed if you've never done that before. (It's quite straightforward if you're familiar with the normal Python setup.py thing or use setuptools.) I can also provide ham and spam training sets from mail.python.org so you can construct a useful database for SpamBayes to score messages against. (You could run with an empty training database but that would just cause all messages to score as "unsure" and be held as possible spam.)
-- Skip Montanaro - skip@pobox.com - http://smontanaro.dyndns.org/
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
skip@pobox.com wrote:
Skip,
I have installed SpamBayes and am running your modified gate_news. The test list is <http://www.msapiro.net/mailman/listinfo/python> and it is gating comp.lang.python from news.bu.edu.
Currently I have
#BAYESCUSTOMIZE=/usr/local/mailman/cron/spambayes.ini
in mailman's crontab. I.e. it is commented out so SpamBayes is not actually being invoked.
I could use the training sets and some advice on how to proceed. Presumably the files
lookup_ip_cache:/usr/local/spambayes-corpus/dnscache.pck crack_image_cache:/usr/local/spambayes-corpus/imagecache.pck persistent_storage_file:/etc/spambayes/wordprobs.cdb
referenced in your spambayes.ini get created when the training sets are processed, but I'm unclear on that part of the process.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
skip@pobox.com wrote:
Skip,
I have installed SpamBayes and am running your modified gate_news. The test list is <http://www.msapiro.net/mailman/listinfo/python> and it is gating comp.lang.python from news.bu.edu.
Currently I have
#BAYESCUSTOMIZE=/usr/local/mailman/cron/spambayes.ini
in mailman's crontab. I.e. it is commented out so SpamBayes is not actually being invoked.
I could use the training sets and some advice on how to proceed. Presumably the files
lookup_ip_cache:/usr/local/spambayes-corpus/dnscache.pck crack_image_cache:/usr/local/spambayes-corpus/imagecache.pck persistent_storage_file:/etc/spambayes/wordprobs.cdb
referenced in your spambayes.ini get created when the training sets are processed, but I'm unclear on that part of the process.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Mark Sapiro
-
skip@pobox.com