[Spambayes] Problems with unheader.py

Greg Ward gward@python.net
Tue, 24 Sep 2002 21:44:12 -0400


I've been playing around with unheader.py -- it looks like it's just the
ticket for cleaning up some of the spam I've gathered.  Apart from
adding Maildir support to it, I think I've found some bugs:

  1) if it tries to read from stdin, it crashes with:
     [...]
       File "/scratch/src/spambayes/unheader.py", line 68, in process_mailbox
         for msg in mailbox.PortableUnixMailbox(f, Parser().parse):
       File "/www/plat/python2.2.1/lib/python2.2/mailbox.py", line 23, in next
         self.fp.seek(self.seekp)
     IOError: [Errno 29] Illegal seek

  2) deSA() removes the first and last line of the body

  3) deSA() crashes on MIME messages -- assumes the result of
     get_payload() is a string

#1 is easy to fix -- just remove the ability to read from stdin.  Anyone
care?  It'll make my patch to add Maildir simpler.

#2 is *probably* an easy/silly bug, but I haven't looked into it yet.

#3 makes it look like no one has used this code (the attempt to remove
SA's "SPAM: " lines in particular) on a real spam corpus.  Really?

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Just because you're paranoid doesn't mean they *aren't* out to get you.