[Spambayes] How well does sb_imapfilter.py work?

Woo, Christopher Christopher.Woo at pepperdine.edu
Thu Aug 19 18:13:21 CEST 2004


I've had a great deal of success running sb_imapfilter.py for at least a
month now. It runs on a Windows XP machine that sits next to my exchange
server. I run it every 15 minutes via Pycron, and a nightly training job. It
filters probably 50-60 spam a day for me. Sometimes it will stop filtering
spam, but if I log into the XP machine and manually run a train and then
clean, it picks back up again. So far that has only happened twice in the
past month, and I'm not sure it isn't a problem with Pycron freaking out.

--
CW  

> -----Original Message-----
> From: Tony Meyer [mailto:tameyer at ihug.co.nz] 
> Sent: Wednesday, August 18, 2004 4:30 PM
> To: 'Jen Wu'; spambayes at python.org
> Subject: RE: [Spambayes] How well does sb_imapfilter.py work?
> 
> > I tried running sb_imapfilter.py -b and setup my 
> > configuration. I then ran sb_imapfilter.py -t to train. It 
> > took a very long time ... and then it just died.
> 
> Stuff about the dying is at the end of this message.  Taking 
> a long time -
> you were processing 1200 messages, which involves retrieving 
> the message
> from the server and writing it back once, so that can take a 
> while.  I don't
> know what "a very long time" is, of course, or how fast 
> 'fast' is in terms
> of the connection.  You're unlikely to often train on that 
> many messages (my
> whole database is less than 600 messages, spam *and* ham), so 
> it wouldn't
> normally be a problem (and typically sb_imapfilter would run in the
> background, either with the -l option or via a cron script, 
> so you wouldn't
> even notice training).
> 
> > I looked at 
> > the stats and it showed that about 600 of each type (spam and 
> > ham) had been trained, though, so I figured I could try 
> > running it against my inbox using sb_imapfilter.py -c. I 
> > noticed after a while that it hadn't moved any messages to 
> > the spam or unsure folders, but that there were a lot of 
> > messages being duplicated in the inbox (so I stopped it).
> 
> With the 1.0 sb_imapfilter messages are duplicated.  IMAP is 
> a terrible
> protocol - you can't edit messages, and you can't move them.  
> You can't even
> delete them (just mark them for deletion and delete *all* 
> messages so marked
> in a folder).  sb_imapfilter writes a new version of each 
> message it sees
> with an ID header (the 1.1 sb_imapfilter does not do this in 
> almost all
> cases).  When messages are classified, it also writes another 
> copy (1.1
> still needs to do this), either in the Inbox (it has the 
> classification
> headers) or in the unsure/spam folder.  The old versions are 
> marked for
> deletion (your mailer may or may not indicate this to you).
> 
> You can get sb_imapfilter to purge the mailbox (deleting 
> messages marked
> with the /Deletion flag) as it goes, but this will delete any 
> messages that
> you have yourself marked for deletion, too.  It's also not 
> undoable, so it's
> wise to make sure that sb_imapfilter is running probably 
> before you turn
> that on.
> 
> I don't know why mail wasn't turning up in the unsure/spam 
> folder (unless
> you simply hadn't come across any non-ham mail yet).  Testing 
> sb_imapfilter
> on a folder with just a few messages (including some spam) 
> would be a good
> idea.  You can also turn on the evidence/clues header, and 
> look at that to
> see why messages were classified as they were.  The output of 
> the script
> will also say how many messages were classified as each type.
> 
> > So, before I continue with my experiments ... has anyone had 
> > any luck with the IMAP filter?
> 
> Some people, yes.  It is the youngest of the main scripts, 
> and I suspect the
> least used, so it does have more rough edges.  Patches are 
> always gratefully
> accepted!
> 
> > I also tried running the script in Linux, but it doesn't seem 
> > to like to run unless you're root, and the Web server isn't 
> > loading.
> 
> There shouldn't be any need to run sb_imapfilter.py as root.  
> What happened
> when you tried?  Perhaps non-root doesn't have access to 
> Python (which would
> be odd)?
> 
> Is port 8880 busy, perhaps?  You can use '-o html_ui:port:8881' on the
> command line to change the port to (eg) 8881 or anything else 
> you like.
> 
> > I'm trying to figure out where it's looking for the 
> > config file now so hopefully I can avoid the Web interface 
> altogether.
> 
> If you use the '-t' or '-c' options on the command line with
> sb_imapfilter.py the web interface doesn't start up.  The 
> configuration file
> is found either in the location specified by the 
> BAYESCUSTOMIZE environment
> variable, if you have set it up, or a file bayescustomize.ini 
> in the current
> directory, or a file .spambayesrc in your home directory, or 
> (with Windows
> only) a file SpamBayes\Proxy\bayescustomize.ini in your 
> Windows 'Application
> Data' directory.
> 
> > Also, out of curiosity, has anyone compared the efficacy of 
> > Spam Bayes with DSPAM? That's the other software package I'm 
> > going to be trying out.
> 
> Not to my knowledge (any I've seen very few filter comparisons worth
> anything.  The most typical problem when one of the compared 
> filters is
> SpamBayes is not dealing with the 'unsure' range properly (whatever
> 'properly' might be <wink>)).  I'm sure people would be 
> interested if you
> wanted to post comparisons here.
> 
> > SpamBayes IMAP Filter Version 0.4 (May 2004)
> > and engine SpamBayes Engine Version 0.3 (January 2004).
> [...]
> > TypeError: string payload expected: <type 'list'>
> 
> This is odd.  For some reason sb_imapfilter managed to get 
> the message and
> turn it into a message object (i.e. parse it) but then when 
> turning it back
> into a string (to put back on the IMAP server) it choked on a 
> malformation.
> The error is meant to occur earlier (where it is caught and handled).
> 
> This should only occur with rare messages, typically spam, that arrive
> malformed in some way.  If sb_imapfilter does stop, you 
> should be able to
> just start it up again and it'll continue from where it was up to (or
> possibly it will immediately choke on that message again, in 
> which case
> you'll have to move that one out of the way).
> 
> You can open a bug report <http://sf.net/projects/spambayes> 
> about this if
> you like (please include all the traceback that you posted 
> here).  I'll get
> to it when I can (but I'm away for 3.5 weeks from today, so 
> it won't be for
> a while).  IAC when a 1.1a1 SpamBayes release comes out, 
> there are many
> sb_imapfilter improvements, so this might be handled by those.
> Alternatively, using Python 2.4 would remove this problem, 
> because the email
> parsing is more robust.
> 
> =Tony Meyer
> 
> ---
> Please always include the list (spambayes at python.org) in your replies
> (reply-all), and please don't send me personal mail about 
> SpamBayes. This
> way, you get everyone's help, and avoid a lack of replies 
> when I'm busy.
> 
> 
> 


More information about the Spambayes mailing list