[Spambayes] Training a procmail filter for a Cyrus IMAP server

Wed Feb 4 16:08:01 EST 2004

Here's my sitch:  I read my mail which is kept on a Cyrus IMAP server from two
places; thus, I'm looking to do server-side filtering.  I believe that I have
a procmail hook into the mail delivery, but the mail, once delivered, is
inaccessible to me except via IMAP protocol (i.e., as I understand it, this is
the Cyrus way).

So, I should be able to set up the procmail filtering according to the
documented suggestions (although, I haven't actually tried this yet -- I'm a
little intimidated ;-).  My question is, how to train the filter?

My answer is, configure the imapfilter and let it produce a database which
I then feed to the procmail filter.  This seems a really obvious approach, and
I'm wondering why I've not seen mention of it anywhere.  (Like, isn't it going
to work??)  I've got a corpus of 1.5 gigabytes or so, including 5K spam
messages (I've really been looking forward to getting a Baysean filter in
place! 8-).  Extracting it out of IMAP would be a pain (I'm not even sure how
to do that in bulk).

I've actually taken the first step -- I've configured the imapfilter and
started training it.  It ran for 5.5 hours last night before it hit a problem
with bogus date headers.  I restarted it after purging the spam with the bad
headers, and it's been running for a couple of hours since.

One thing for people to note.  This is probably obvious to the afficiandos,
but it wasn't obvious to me:  the trainer adds a line to the mail message
headers (even though training seems like a read-only operation).  The effect
of this is that my mail clients discovered that their caches of the message
headers were now stale.  This wasn't a big deal here at work with the
multi-megabit network connection, but, at home, with my soda-straw dial-up,
this was a bit painful.

One other question, while I'm here.  What's the deal with using a database vs.
a pickle?  I understand that the former is supposed to be faster for a single
message lookup, and the later is better for bulk training.  But, I presume
that what I want (once I'm done training) is a database.  How do I convert the
pickle into a database?  (Also, I had problems using the pickle, but those
might have been...well...pilot errors -- the whole having to save-and-restart
in the middle of configuring the imapfilter using the web interface kind of
messed me up for awhile.  ;-)

            Thanks,

                Webb

--
------------------------------------------------------------------------
Webb Scales                                Hewlett-Packard Company
scales at zko.dec.com                      110 Spit Brook Rd, ZKO2-3/N30
Voice: 603.884.2196, FAX: 603.884.0120     Nashua, NH 03062-2711
      When everything's coming your way, you're in the wrong lane.
------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20040204/115b39a8/attachment.html