[Spambayes] stack.pop() ate my multipart message

Neale Pickett neale@woozle.org
10 Sep 2002 22:14:36 -0700


I've been running hammie on all my incoming messages, and I noticed that
multipart/alternative messages are totally hosed: they have no content,
just the MIME boundaries.  For instance, the following message:

------------------------------8<------------------------------
From: somebody <someone@somewhere.org>
To: neale@woozle.org
Subject: Booga
Content-type: multipart/alternative; boundary="snot"

This is a multi-part message in MIME format.

--snot
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: 7BIT

Hi there.
--snot
Content-type: text/html; charset=iso-8859-1
Content-transfer-encoding: 7BIT

<pre>Hi there.</pre>
--snot--
------------------------------8<------------------------------

Comes out like this:

------------------------------8<------------------------------
From: somebody <someone@somewhere.org>
To: neale@woozle.org
Subject: Booga
Content-type: multipart/alternative; boundary="snot"
X-Hammie-Disposition: No; 0.74; [unrelated gar removed]

This is a multi-part message in MIME format.

--snot

--snot--
------------------------------8<------------------------------

I'm using "Python 2.3a0 (#1, Sep  9 2002, 22:56:24)".

I've fixed it with the following patch to Tim's tokenizer, but I have to
admit that I'm baffled as to why it works.  Maybe there's some subtle
interaction between generators and lists that I can't understand.  Or
something.  Being as I'm baffled, I don't imagine any theory I come up
with will be anywhere close to reality.

In any case, be advised that (at least for me) hammie will eat
multipart/alternative messages until this patch is applied.  The patch
seems rather bogus though, so I'm not checking it in, in the hope that
there's a better fix I just wasn't capable of discovering :)

------------------------------8<------------------------------
Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.15
diff -u -r1.15 tokenizer.py
--- tokenizer.py	10 Sep 2002 18:15:49 -0000	1.15
+++ tokenizer.py	11 Sep 2002 05:01:16 -0000
@@ -1,3 +1,4 @@
+#! /usr/bin/env python
 """Module to tokenize email messages for spam filtering."""
 
 import email
@@ -507,7 +508,8 @@
             htmlpart = textpart = None
             stack = part.get_payload()
             while stack:
-                subpart = stack.pop()
+                subpart = stack[0]
+                stack = stack[1:]
                 ctype = subpart.get_content_type()
                 if ctype == 'text/plain':
                     textpart = subpart
------------------------------8<------------------------------