[spambayes-dev] Re: Mind if I commit a new contrib/spambayes.el?

Tony Meyer tameyer at ihug.co.nz
Mon Oct 13 22:53:13 EDT 2003


[Neale Pickett]
> I'd also like to edit the README to include a small
> section about the popular mutt mailer (and pine, if I can figure out a
> way to do it).

I just received this, which, since you're updating the README, you might be
willing to integrate also?

=Tony Meyer

-----Original Message-----
From: Alan W. Irwin [mailto:irwin at beluga.phys.uvic.ca] 
Sent: Tuesday, 14 October 2003 2:04 p.m.
To: Tony Meyer
Subject: RE: OOPS, somebody already had fixed this include_trained "sense"
bug for mbox in cvs


On 2003-10-14 10:49+1300 Tony Meyer wrote:

> > As soon as you get to the beta stage of spambayes development so 
> > that your API settles down, I will try to help you guys out with 
> > some updates to the documentation of the command-line use of sb. I 
> > could infer what I needed to know from the present documentation, 
> > but it needs some changes to keep up with the API changes.
>
> That would be fantastic.  Note that we have feature frozen (so also 
> API
> frozen) a branch for all future 1.0 releases.  This means that there
> shouldn't be any API changes until the first 1.1a1 release, only bug
fixes.
> (was that a subtle enough 'you can start now' hint? <wink>).

I got lots of stuff on my plate so only extremely limited time to offer, but
I do want to pay back a bit to the SpamBayes project since it has already
been so useful to me.  I therefore wrote up my recent experiences as a
COOKBOOK.txt which I have attached to this e-mail.  It is supposed to be a
slightly extended replacement for HAMMIE.txt, and if Neale is already
revising that, he is certainly welcome to use my additions.  I have
certainly relied on him where my experience came up short (procmailrc,
maildir format, cron scripts).  I am certainly no computer guru.  But what I
inferred from his cookbook did work for my Pine setup (haven't actually
tried the cron script yet, but it should be the same as running the script
by hand which does work.) So my limitations should be kept in mind when the
SpamBayes developers are deciding whether to replace HAMMIE.txt with
COOKBOOK.txt. But if Neale does not want to keep maintaining HAMMIE.txt,
then I think COOKBOOK.txt is a better name (since the hammie.py script has
gone into the bit-bucket.)

Also, note I have regretfully signed my name to COOKBOOK.txt. :(

That of course means I am willing to maintain it when some of the 1.0.a6
specifics in there are no longer applicable.  I know what a pain in the butt
that can be since I have done documentation maintenance for all the
open-source projects listed in my signature, but I am willing to do it for
this one document.

Alan
__________________________
Alan W. Irwin
email: irwin at beluga.phys.uvic.ca
phone: 250-727-2902

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the PLplot scientific plotting software
package (plplot.org), the Yorick front-end to PLplot (yplot.sf.net), the
Loads of Linux Links project (loll.sf.net), and the Linux Brochure Project
(lbproject.sf.net). __________________________

Linux-powered Science
__________________________
-------------- next part --------------
These COOKBOOK.txt instructions are for the user with some Unix/Linux
command-line experience.  They are an expanded and updated version of Neale
Pickett's HAMMIE.txt.

15-Minute Procmail-based Setup.  (Neale used to say 3-Minutes, but I think it
is worth the extra time to install SpamBayes properly.)
-------------------------------------

What you will have after doing all this:

* All your existing mail that has been used for training will have a new
  "X-Spambayes-Trained" header. Spambayes uses this to keep track of which
  messages it's already learned about.

* SpamBayes will look at all your incoming mail.  Messages it thinks are
  spam will be put in a "spam" mail folder.  Messages it is unsure about
  will be put in an "unsure" mail folder. Everything else will be delivered
  normally.
  
* Every morning, SpamBayes will go through your mail folders and train
  itself on any new messages.  It will also adjust to mail that's been
  re-filed: something it thought was ham but was actually spam, and
  vice-versa.  Be sure to keep spam in your spam folder for at least a
  day or two before deleting it--I suggest keeping it for a full year,
  just in case you need to re-train SpamBayes.

-----

What you need:

* the SpamBayes package
* Python 2.2.2 or newer
* a text editor
* Procmail (most systems have this)
* a working crond (most systems have this)
* (optional) a mailbox full of spam and a mailbox full of ham

-----

Instructions:

1.  Download, unpack, build and install the SpamBayes package.  That's
    version 1.0a6 at the time of this writing, and it is available from 
    http://sourceforge.net/project/showfiles.php?group_id=61702. cd to the
    directory where you unpacked the tarball, and patch it!  Note there is a
    bug in version 1.0a6 (reversed sense of the include_trained flag under
    [HEADERS] for mbox format).  See
    http://cvs.sourceforge.net/viewcvs.py/spambayes/spambayes/scripts/sb_mboxtrain.py?r1=1.5&r2=1.6
    for the two-line fix.  Now after patching, build the software as an
    ordinary user with
    
    python setup.py build
    
    Install it as root user using
    
    python setup.py install

2.  Configure SpamBayes for your ordinary user account where you get e-mail.

      cat > $HOME/.spambayesrc
[Storage]
persistent_storage_file: ~/.spambayes/.hammiedb
persistent_use_database: True
	    

3.  Create a new database:

      sb_filter.py -n

4.  (optional) Train it on your existing mail:

      sb_mboxtrain.py -d $HOME/.spambayes/.hammiedb -g $HOME/Mail/inbox -s $HOME/Mail/spam

    You can add additional folder names if you like, using -g for "good"
    mail folders, and -s for "spam" folders.
    
5.  Add the following three recipes to the top of your .procmailrc:


:0 fw:hamlock
| sb_filter.py

:0
* ^X-SpamBayes-Classification: spam
$HOME/Mail/sb_spam

:0
* ^X-SpamBayes-Classification: unsure
$HOME/Mail/sb_unsure

    The last two recipes are for mbox format and they work for me.  For
    maildir format use something like the following (adapted from
    HAMMIE.txt since as a Pine user I don't know about the maildir format):

:0
* ^X-SpamBayes-Classification: spam
$HOME/Maildir/.sb_spam/

:0
* ^X-SpamBayes-Classification: unsure
$HOME/Mail/.sb_unsure/


    If you're not sure what format you use, ask your system
    administrator.  If you are the system administrator, check the
    documentation of your mail program.  With the notable exception of
    Pine, which can only read mbox format unless patched, most modern
    MUAs can handle both Maildir and mbox formats.

6.  Add the following cron job to train on new or refiled messages every
    morning at 2:21am ("crontab -e" with vixie cron, the default on most
    Linux systems):

      21 2 * * * sb_mboxtrain.py -d $HOME/.spambayes/.hammiedb -g $HOME/Mail/inbox -s $HOME/Mail/spam

    As in step 4, you can add additional folder names here too.  It's
    important to do so if you regularly file mail in different folders,
    since otherwise SpamBayes will never learn anything about those
    messages.

7.  SpamBayes should now be filtering all your mail and training itself
    on your mailboxes.  Occasionally a message will be misfiled.  Just
    move that message to the correct folder, and spambayes will learn
    from its mistake the next morning.
    
-----

That's it!  You're all done.

If you have questions or comments about these instructions, please mail
them to irwin at beluga.phys.uvic.ca.

Alan W. Irwin


More information about the spambayes-dev mailing list