[Spambayes] Re: How well does sb_imapfilter.py work?

Jen Wu jennyw at colorfulexpressions.com
Thu Aug 19 22:48:55 CEST 2004


>"Tony Meyer" <tameyer at ihug.co.nz> wrote in message
news:ECBA357DDED63B4995F5C1F5CBE5B1E86C36DA at its-xchg4.massey.ac.nz...
>Stuff about the dying is at the end of this message.  Taking a long time -
>you were processing 1200 messages, which involves retrieving the message
>from the server and writing it back once, so that can take a while.  I
don't
>know what "a very long time" is, of course, or how fast 'fast' is in terms

A very long time was about three hours. I later started running the script
on the Linux box that I have IMAP installed on instead, and it worked a lot
faster (don't know why). I also noticed that on the Linux box, most of the
processing time was in IMAP, not spambayes -- probably because of the
re-writing that you were mentioning.  I also noticed that some messages were
copied more than once ... in otherwords, there could be three or more
deleted versions -- is there a reason it needs to do this instead of just
reading the message once and writing it back?

>With the 1.0 sb_imapfilter messages are duplicated.  IMAP is a terrible
>protocol - you can't edit messages, and you can't move them.  You can't
even
>delete them (just mark them for deletion and delete *all* messages so
marked
>in a folder).  sb_imapfilter writes a new version of each message it sees
>with an ID header (the 1.1 sb_imapfilter does not do this in almost all
>cases).  When messages are classified, it also writes another copy (1.1
>still needs to do this), either in the Inbox (it has the classification
>headers) or in the unsure/spam folder.  The old versions are marked for
>deletion (your mailer may or may not indicate this to you).

Out of curiosity, could this be done using UIDs? I guess that would add the
necessity of keeping track of UIDs, but if reading/writing is the main
performance culprit, it would be a lot faster. Maybe that's what happens in
1.1?

>I don't know why mail wasn't turning up in the unsure/spam folder (unless
>you simply hadn't come across any non-ham mail yet).  Testing sb_imapfilter

That's the weird thing. Almost all the mail I had in my inbox was spam, but
nothing was showing up in the spam or unsure folders. There were a lot of
messages in the inbox that were duplicated, though -- in retrospect, I
should have looked at them to see what headers were added. Again, some
messages were copied three or more times.

>Some people, yes.  It is the youngest of the main scripts, and I suspect
the
>least used, so it does have more rough edges.  Patches are always
gratefully
>accepted!

I'll keep testing it ... I'll try downloading 1.1, too (I guess we can get
this from CVS?).

>There shouldn't be any need to run sb_imapfilter.py as root.  What happened
>when you tried?  Perhaps non-root doesn't have access to Python (which
would
>be odd)?

No, I just goofed ... didn't put in a .spambayesrc. Once I did that, it
worked fine!

>If you use the '-t' or '-c' options on the command line with
>sb_imapfilter.py the web interface doesn't start up.  The configuration
file

Good to know!

>Not to my knowledge (any I've seen very few filter comparisons worth
>anything.  The most typical problem when one of the compared filters is
>SpamBayes is not dealing with the 'unsure' range properly (whatever
>'properly' might be <wink>)).  I'm sure people would be interested if you
>wanted to post comparisons here.

I'll let you know ... DSPAM is supposed to be very fast (and also uses a
central database for all users), but I decided to try SpamBayes first
because three things are more important than speed to me right now: 1) IMAP
support; 2) It can mark messages as unsure; and 3) It's written in Python.
The one thing I'm unclear on is also the one thing that matters the most -- 
how well they detect spam (including avoiding false-positives).

>You can open a bug report <http://sf.net/projects/spambayes> about this if

If I run into it again, I might do that.

Thanks!

Jen





More information about the Spambayes mailing list