[Spambayes-checkins] spambayes/Outlook2000 manager.py,1.14,1.15

Tim Peters tim_one@users.sourceforge.net
Mon, 21 Oct 2002 11:55:32 -0700


Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory usw-pr-cvs1:/tmp/cvs-serv4305

Modified Files:
	manager.py 
Log Message:
GetBayesStreamForMessage():  For every msg with MIME structure, Outlook
left the boundary info in the headers, but there are no boundaries in
the body.  As a result, all of the body was invisible to the Python email
pkg.  Reconstituting the full original email from Outlook appears to be
a real bitch -- maybe Mozilla has code for this we can use (but I suspect
its import-from-Outlook gimmick actually crawls over the .pst file; I
haven't used it, just read about it).

In the meantime, quick hack:  squash the text part (if any) and the HTML
part (if any) together as one big text blob, and if the headers make any
claims about MIME type and/or transfer encoding, simply delete those
header lines.


Index: manager.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/manager.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** manager.py	20 Oct 2002 23:51:04 -0000	1.14
--- manager.py	21 Oct 2002 18:55:30 -0000	1.15
***************
*** 84,87 ****
--- 84,89 ----
      def GetBayesStreamForMessage(self, message):
          # Note - caller must catch COM error
+         import email
+ 
          headers = message.Fields[0x7D001E].Value
          headers = headers.encode('ascii', 'replace')
***************
*** 92,97 ****
              body = ""
          body += message.Text.encode("ascii", "replace")
!         return headers + body
!       
  
      def LoadBayes(self):
--- 94,109 ----
              body = ""
          body += message.Text.encode("ascii", "replace")
! 
!         # XXX If this was originally a MIME msg, we're hosed at this point --
!         # the boundary tag in the headers doesn't exist in the body, and
!         # the msg is simply ill-formed.  The miserable hack here simply
!         # squashes the text part (if any) and the HTML part (if any) together,
!         # and strips MIME info from the original headers.
!         msg = email.message_from_string(headers + '\n' + body)
!         if msg.has_key('content-type'):
!             del msg['content-type']
!         if msg.has_key('content-transfer-encoding'):
!             del msg['content-transfer-encoding']
!         return msg
  
      def LoadBayes(self):