[Spambayes] Re: [Email-SIG] Maybe a bug, maybe not

Alexandre Ratti alex at gabuzomeu.net
Mon May 3 15:21:46 EDT 2004


Hi Eric,


[Eric S. Johansson wrote]
> found a very common form of spam that triggers an exception.  don't know 
> if you considered a bug or not.  I've enclosed a sample message and a 
> very simple program to trigger the bug.  From my limited understanding, 
> the payload type is correct but somehow it is dispatched to the wrong 
> handler.  When I was writing the test program, I also copied some of the 
> generator code so I could see what method was being requested etc.  then 
> I ran into limits of my knowledge and time
[http://mail.python.org/pipermail/email-sig/2004-May/000101.html]

I also received several junk emails that crash the email package. They 
are a pain because they also crash spambayes since it uses this package. 
I'm copying the spambayes list since people started reporting this 
problem on this list too.

I suspect that the crash occur because these messages have multipart 
boundaries but have a text content type header. This cause the 
"_handle_text" method of the Generator class (in email/Generator.py) to 
be called. This method expects get_payload() to return a string, which 
doesn't happen since the message is multipart.

This seems to similar to a know issue:

http://sourceforge.net/tracker/index.php?func=detail&aid=846938&group_id=5470&atid=105470

I'm not sure at which levels in the email package this problem should be 
fixed. For now, I applied this simple fix in the Generator.py module: 
replace the _handle_text method with this code:

     def _handle_text(self, msg):
         payload = msg.get_payload()
         if payload is None:
             return
         cset = msg.get_charset()
         if cset is not None:
             payload = cset.body_encode(payload)
         if not _isstring(payload):
             # Changed to handle malformed messages with a text base
             # type and a multipart content.
             if type(payload) == type([]) and msg.is_multipart():
                 return self._handle_multipart(msg)
             else:
                 raise TypeError, 'string payload expected: %s' % 
type(payload)
         if self._mangle_from_:
             payload = fcre.sub('>From ', payload)
         self._fp.write(payload)


or use this diff (against the 2.5.4 version of the email package):

--- Generator.orig.py   Mon May  3 20:41:27 2004
+++ Generator.py        Mon May  3 20:43:46 2004
@@ -197,7 +197,12 @@
          if cset is not None:
              payload = cset.body_encode(payload)
          if not _isstring(payload):
-            raise TypeError, 'string payload expected: %s' % type(payload)
+            # Changed to handle malformed messages with a text base
+            # type and a multipart content.
+            if type(payload) == type([]) and msg.is_multipart():
+                return self._handle_multipart(msg)
+            else:
+               raise TypeError, 'string payload expected: %s' % 
type(payload)
          if self._mangle_from_:
              payload = fcre.sub('>From ', payload)
          self._fp.write(payload)

This change seems to fix the problem. I fed a mailbox with several of 
these messages to spambayes and they were parsed OK and flagged as spam 
as expected.


Cheers.

Alexandre





More information about the Spambayes mailing list