[Spambayes] Spambayes/procmail

David Abrahams dave at boost-consulting.com
Mon Mar 31 13:20:34 EST 2003


Skip Montanaro <skip at pobox.com> writes:

>     >> I use spambayes with procmail.  The major issue is generally not one
>     >> of getting messages classified, but of getting them trained.
>
>     Dave> I figured it would be; I think that's what I meant by
>     Dave> "classified".  I do have a folder full of accumulated spam.  What
>     Dave> has been your strategy for training?
>
> Here's what I do.  It's sensitive to my particular mail setup, so you can
> probably only use this as a rough guide.
>
> My mail reader is VM inside XEmacs.  

I'm using GNUs, FWIW.

> VM has a "l"abel command prefix.  I added two new keys to its
> keymap, "h" and "s" (which were fortuitously unused) to copy
> messages to spam and ham folders:

Here's what I've never understood about this system: shouldn't it be
enough to label spam?  GNUs gives me a key to label a message as spam.
If I collect all of those, shouldn't I be able to tell spambayes that
everything in my INBOX that's been read and isn't in my SpamBox is
ham?

>   (defun copy-to-spam ()
>     (interactive)
>     (vm-save-message (expand-file-name "~/tmp/newspam"))
>     (vm-undelete-message 1))
>
>   (defun copy-to-nonspam ()
>     (interactive)
>     (vm-save-message (expand-file-name "~/tmp/newham"))
>     (vm-undelete-message 1))
>
>   (define-key vm-mode-map "ls" 'copy-to-spam)
>   (define-key vm-summary-mode-map "ls" 'copy-to-spam)
>   (define-key vm-mode-map "lh" 'copy-to-nonspam)
>   (define-key vm-summary-mode-map "lh" 'copy-to-nonspam)
>
> ~/tmp/new{ham,spam} are then processed using a fairly simple shell script:
>
>     #!/bin/bash
>
>     export BAYESCUSTOMIZE=$HOME/hammie.opt
>     cd ~/tmp
>
>     base=new
>     db=hammie.db
>
>     # touch the messages up a bit to avoid spurious "clues"
>     if [ -f ${base}ham -a -f ${base}spam ] ; then
>         unheader.py -p 'X-VM|X-Hammie|X-Spam' ${base}ham > ${base}ham.clean
>         unheader.py -p 'X-VM|X-Hammie|X-Spam' ${base}spam > ${base}spam.clean
>
>         # do the deed
>         hammie.py -d -p $db -g ${base}ham.clean -s ${base}spam.clean
>
>         # save the files for later retraining
>         cat ${base}ham.clean >> ${base}ham.clean.save
>         echo "" >> ${base}ham.clean.save
>         rm ${base}ham ${base}ham.clean
>
>         cat ${base}spam.clean >> ${base}spam.clean.save
>         echo "" >> ${base}spam.clean.save
>         rm ${base}spam ${base}spam.clean
>     else
>         echo Missing ${base}ham and/or ${base}spam files
>     fi
>
> I run the train script periodically to train on new ham and spam, then copy
> the resulting hammie.db file to where it's really used:
>
>     % train
>     Training ham (newham.clean):
>         12
>     Training spam (newspam.clean):
>         29
>     % cp -p hammie.db ~
>
> This setup works fine for me, though probably won't be as attractive for
> people who aren't as addicted to the shell prompt as I am.

Well, I'm not sure I understand it yet, but I think I'll get there.
Thanks!

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com




More information about the Spambayes mailing list