mbox despamming script

Michael Hudson mwh at python.net
Thu Nov 27 07:21:05 EST 2003


Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:

> I was surprised there was no obvious way with spamassassin (maybe I
> shoulda looked at spambayes) to split an existing mbox file into its
> spam and non-spam messages.  So I wrote one.  It's pretty slow, taking
> around 1.5 seconds per message on a 2 ghz Athlon, making me wonder how
> serious ISP's getting thousands of incoming messages per hour can run
> anything like spamassassin on all of them.  But for my purposes it's ok.
> Comments and improvements are welcome.

It's my experience that mailbox is pretty slow at reading mbox files.
I have memories of speeding up some mail-statistics gathering stuff by
a large amount by implementing my own mbox "parser" (basically
s.find('\n\nFrom ') or similar, I forget).  I'm not sure I'd like to
use this approach on something less forgiving than stats, though :-)

Cheers,
mwh

-- 
59. In English every word can be verbed. Would that it were so in
    our programming languages.
  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html




More information about the Python-list mailing list