[Spambayes] How well does sb_imapfilter.py work?
Woo, Christopher
Christopher.Woo at pepperdine.edu
Thu Aug 19 18:13:21 CEST 2004
I've had a great deal of success running sb_imapfilter.py for at least a
month now. It runs on a Windows XP machine that sits next to my exchange
server. I run it every 15 minutes via Pycron, and a nightly training job. It
filters probably 50-60 spam a day for me. Sometimes it will stop filtering
spam, but if I log into the XP machine and manually run a train and then
clean, it picks back up again. So far that has only happened twice in the
past month, and I'm not sure it isn't a problem with Pycron freaking out.
--
CW
> -----Original Message-----
> From: Tony Meyer [mailto:tameyer at ihug.co.nz]
> Sent: Wednesday, August 18, 2004 4:30 PM
> To: 'Jen Wu'; spambayes at python.org
> Subject: RE: [Spambayes] How well does sb_imapfilter.py work?
>
> > I tried running sb_imapfilter.py -b and setup my
> > configuration. I then ran sb_imapfilter.py -t to train. It
> > took a very long time ... and then it just died.
>
> Stuff about the dying is at the end of this message. Taking
> a long time -
> you were processing 1200 messages, which involves retrieving
> the message
> from the server and writing it back once, so that can take a
> while. I don't
> know what "a very long time" is, of course, or how fast
> 'fast' is in terms
> of the connection. You're unlikely to often train on that
> many messages (my
> whole database is less than 600 messages, spam *and* ham), so
> it wouldn't
> normally be a problem (and typically sb_imapfilter would run in the
> background, either with the -l option or via a cron script,
> so you wouldn't
> even notice training).
>
> > I looked at
> > the stats and it showed that about 600 of each type (spam and
> > ham) had been trained, though, so I figured I could try
> > running it against my inbox using sb_imapfilter.py -c. I
> > noticed after a while that it hadn't moved any messages to
> > the spam or unsure folders, but that there were a lot of
> > messages being duplicated in the inbox (so I stopped it).
>
> With the 1.0 sb_imapfilter messages are duplicated. IMAP is
> a terrible
> protocol - you can't edit messages, and you can't move them.
> You can't even
> delete them (just mark them for deletion and delete *all*
> messages so marked
> in a folder). sb_imapfilter writes a new version of each
> message it sees
> with an ID header (the 1.1 sb_imapfilter does not do this in
> almost all
> cases). When messages are classified, it also writes another
> copy (1.1
> still needs to do this), either in the Inbox (it has the
> classification
> headers) or in the unsure/spam folder. The old versions are
> marked for
> deletion (your mailer may or may not indicate this to you).
>
> You can get sb_imapfilter to purge the mailbox (deleting
> messages marked
> with the /Deletion flag) as it goes, but this will delete any
> messages that
> you have yourself marked for deletion, too. It's also not
> undoable, so it's
> wise to make sure that sb_imapfilter is running probably
> before you turn
> that on.
>
> I don't know why mail wasn't turning up in the unsure/spam
> folder (unless
> you simply hadn't come across any non-ham mail yet). Testing
> sb_imapfilter
> on a folder with just a few messages (including some spam)
> would be a good
> idea. You can also turn on the evidence/clues header, and
> look at that to
> see why messages were classified as they were. The output of
> the script
> will also say how many messages were classified as each type.
>
> > So, before I continue with my experiments ... has anyone had
> > any luck with the IMAP filter?
>
> Some people, yes. It is the youngest of the main scripts,
> and I suspect the
> least used, so it does have more rough edges. Patches are
> always gratefully
> accepted!
>
> > I also tried running the script in Linux, but it doesn't seem
> > to like to run unless you're root, and the Web server isn't
> > loading.
>
> There shouldn't be any need to run sb_imapfilter.py as root.
> What happened
> when you tried? Perhaps non-root doesn't have access to
> Python (which would
> be odd)?
>
> Is port 8880 busy, perhaps? You can use '-o html_ui:port:8881' on the
> command line to change the port to (eg) 8881 or anything else
> you like.
>
> > I'm trying to figure out where it's looking for the
> > config file now so hopefully I can avoid the Web interface
> altogether.
>
> If you use the '-t' or '-c' options on the command line with
> sb_imapfilter.py the web interface doesn't start up. The
> configuration file
> is found either in the location specified by the
> BAYESCUSTOMIZE environment
> variable, if you have set it up, or a file bayescustomize.ini
> in the current
> directory, or a file .spambayesrc in your home directory, or
> (with Windows
> only) a file SpamBayes\Proxy\bayescustomize.ini in your
> Windows 'Application
> Data' directory.
>
> > Also, out of curiosity, has anyone compared the efficacy of
> > Spam Bayes with DSPAM? That's the other software package I'm
> > going to be trying out.
>
> Not to my knowledge (any I've seen very few filter comparisons worth
> anything. The most typical problem when one of the compared
> filters is
> SpamBayes is not dealing with the 'unsure' range properly (whatever
> 'properly' might be <wink>)). I'm sure people would be
> interested if you
> wanted to post comparisons here.
>
> > SpamBayes IMAP Filter Version 0.4 (May 2004)
> > and engine SpamBayes Engine Version 0.3 (January 2004).
> [...]
> > TypeError: string payload expected: <type 'list'>
>
> This is odd. For some reason sb_imapfilter managed to get
> the message and
> turn it into a message object (i.e. parse it) but then when
> turning it back
> into a string (to put back on the IMAP server) it choked on a
> malformation.
> The error is meant to occur earlier (where it is caught and handled).
>
> This should only occur with rare messages, typically spam, that arrive
> malformed in some way. If sb_imapfilter does stop, you
> should be able to
> just start it up again and it'll continue from where it was up to (or
> possibly it will immediately choke on that message again, in
> which case
> you'll have to move that one out of the way).
>
> You can open a bug report <http://sf.net/projects/spambayes>
> about this if
> you like (please include all the traceback that you posted
> here). I'll get
> to it when I can (but I'm away for 3.5 weeks from today, so
> it won't be for
> a while). IAC when a 1.1a1 SpamBayes release comes out,
> there are many
> sb_imapfilter improvements, so this might be handled by those.
> Alternatively, using Python 2.4 would remove this problem,
> because the email
> parsing is more robust.
>
> =Tony Meyer
>
> ---
> Please always include the list (spambayes at python.org) in your replies
> (reply-all), and please don't send me personal mail about
> SpamBayes. This
> way, you get everyone's help, and avoid a lack of replies
> when I'm busy.
>
>
>
More information about the Spambayes
mailing list