mbox despamming script
Michael Hudson
mwh at python.net
Thu Nov 27 07:21:05 EST 2003
Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:
> I was surprised there was no obvious way with spamassassin (maybe I
> shoulda looked at spambayes) to split an existing mbox file into its
> spam and non-spam messages. So I wrote one. It's pretty slow, taking
> around 1.5 seconds per message on a 2 ghz Athlon, making me wonder how
> serious ISP's getting thousands of incoming messages per hour can run
> anything like spamassassin on all of them. But for my purposes it's ok.
> Comments and improvements are welcome.
It's my experience that mailbox is pretty slow at reading mbox files.
I have memories of speeding up some mail-statistics gathering stuff by
a large amount by implementing my own mbox "parser" (basically
s.find('\n\nFrom ') or similar, I forget). I'm not sure I'd like to
use this approach on something less forgiving than stats, though :-)
Cheers,
mwh
--
59. In English every word can be verbed. Would that it were so in
our programming languages.
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
More information about the Python-list
mailing list