[Spambayes] Spambayes/procmail

Skip Montanaro skip at pobox.com
Mon Mar 31 10:53:05 EST 2003


    >> I use spambayes with procmail.  The major issue is generally not one
    >> of getting messages classified, but of getting them trained.

    Dave> I figured it would be; I think that's what I meant by
    Dave> "classified".  I do have a folder full of accumulated spam.  What
    Dave> has been your strategy for training?

Here's what I do.  It's sensitive to my particular mail setup, so you can
probably only use this as a rough guide.

My mail reader is VM inside XEmacs.  VM has a "l"abel command prefix.  I
added two new keys to its keymap, "h" and "s" (which were fortuitously
unused) to copy messages to spam and ham folders:

  (defun copy-to-spam ()
    (interactive)
    (vm-save-message (expand-file-name "~/tmp/newspam"))
    (vm-undelete-message 1))

  (defun copy-to-nonspam ()
    (interactive)
    (vm-save-message (expand-file-name "~/tmp/newham"))
    (vm-undelete-message 1))

  (define-key vm-mode-map "ls" 'copy-to-spam)
  (define-key vm-summary-mode-map "ls" 'copy-to-spam)
  (define-key vm-mode-map "lh" 'copy-to-nonspam)
  (define-key vm-summary-mode-map "lh" 'copy-to-nonspam)

~/tmp/new{ham,spam} are then processed using a fairly simple shell script:

    #!/bin/bash

    export BAYESCUSTOMIZE=$HOME/hammie.opt
    cd ~/tmp

    base=new
    db=hammie.db

    # touch the messages up a bit to avoid spurious "clues"
    if [ -f ${base}ham -a -f ${base}spam ] ; then
        unheader.py -p 'X-VM|X-Hammie|X-Spam' ${base}ham > ${base}ham.clean
        unheader.py -p 'X-VM|X-Hammie|X-Spam' ${base}spam > ${base}spam.clean

        # do the deed
        hammie.py -d -p $db -g ${base}ham.clean -s ${base}spam.clean

        # save the files for later retraining
        cat ${base}ham.clean >> ${base}ham.clean.save
        echo "" >> ${base}ham.clean.save
        rm ${base}ham ${base}ham.clean

        cat ${base}spam.clean >> ${base}spam.clean.save
        echo "" >> ${base}spam.clean.save
        rm ${base}spam ${base}spam.clean
    else
        echo Missing ${base}ham and/or ${base}spam files
    fi

I run the train script periodically to train on new ham and spam, then copy
the resulting hammie.db file to where it's really used:

    % train
    Training ham (newham.clean):
        12
    Training spam (newspam.clean):
        29
    % cp -p hammie.db ~

This setup works fine for me, though probably won't be as attractive for
people who aren't as addicted to the shell prompt as I am.

Skip



More information about the Spambayes mailing list