[Spambayes] Problems with unheader.py
Greg Ward
gward@python.net
Tue, 24 Sep 2002 21:44:12 -0400
I've been playing around with unheader.py -- it looks like it's just the
ticket for cleaning up some of the spam I've gathered. Apart from
adding Maildir support to it, I think I've found some bugs:
1) if it tries to read from stdin, it crashes with:
[...]
File "/scratch/src/spambayes/unheader.py", line 68, in process_mailbox
for msg in mailbox.PortableUnixMailbox(f, Parser().parse):
File "/www/plat/python2.2.1/lib/python2.2/mailbox.py", line 23, in next
self.fp.seek(self.seekp)
IOError: [Errno 29] Illegal seek
2) deSA() removes the first and last line of the body
3) deSA() crashes on MIME messages -- assumes the result of
get_payload() is a string
#1 is easy to fix -- just remove the ability to read from stdin. Anyone
care? It'll make my patch to add Maildir simpler.
#2 is *probably* an easy/silly bug, but I haven't looked into it yet.
#3 makes it look like no one has used this code (the attempt to remove
SA's "SPAM: " lines in particular) on a real spam corpus. Really?
Greg
--
Greg Ward <gward@python.net> http://www.gerg.ca/
Just because you're paranoid doesn't mean they *aren't* out to get you.