[Spambayes-checkins]
spambayes/Outlook2000 addin.py,1.35,1.36 msgstore.py,1.31,1.32
Tim Peters
tim_one@users.sourceforge.net
Thu Nov 21 02:57:07 2002
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1:/tmp/cvs-serv4035/Outlook2000
Modified Files:
addin.py msgstore.py
Log Message:
GetEmailPackageObject(): renamed the optional arg to strip_mime_headers,
and put back the default strip of the Content-Transfer-Encoding header I
took out before. Mark Hammond rediscovered the hard way why it was there
before: Outlook already delivers decoded text, and leaving the CTE
header in makes the (Python) email pkg try to decode it again. This
wasn't fatal (because the tokenizer recovers from decoding rrrors), but
did lead to some weird results. Explained this all in excruciatingly
long comments, so nobody is tempted to take it out again.
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.35
retrieving revision 1.36
diff -C2 -d -r1.35 -r1.36
*** addin.py 14 Nov 2002 11:07:18 -0000 1.35
--- addin.py 21 Nov 2002 02:57:05 -0000 1.36
***************
*** 249,253 ****
push("<h2>Message Stream:</h2><br>")
push("<PRE>\n")
! msg = msgstore_message.GetEmailPackageObject(strip_content_type=False)
push(escape(msg.as_string(), True))
push("</PRE>\n")
--- 249,253 ----
push("<h2>Message Stream:</h2><br>")
push("<PRE>\n")
! msg = msgstore_message.GetEmailPackageObject(strip_mime_headers=False)
push(escape(msg.as_string(), True))
push("</PRE>\n")
Index: msgstore.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/msgstore.py,v
retrieving revision 1.31
retrieving revision 1.32
diff -C2 -d -r1.31 -r1.32
*** msgstore.py 14 Nov 2002 07:04:45 -0000 1.31
--- msgstore.py 21 Nov 2002 02:57:05 -0000 1.32
***************
*** 514,523 ****
self.mapi_object = self.msgstore._OpenEntry(self.id)
! def GetEmailPackageObject(self, strip_content_type=True):
# Return an email.Message object.
! # strip_content_type is a hack, and should be left True unless you're
# trying to display all the headers for diagnostic purposes. If we
# figure out something better to do, it should go away entirely.
! # The problem: suppose a msg is multipart/alternative, with
# text/plain and text/html sections. The latter MIME decorations
# are plain missing in what _GetMessageText() returns. If we leave
--- 514,525 ----
self.mapi_object = self.msgstore._OpenEntry(self.id)
! def GetEmailPackageObject(self, strip_mime_headers=True):
# Return an email.Message object.
! #
! # strip_mime_headers is a hack, and should be left True unless you're
# trying to display all the headers for diagnostic purposes. If we
# figure out something better to do, it should go away entirely.
! #
! # Problem #1: suppose a msg is multipart/alternative, with
# text/plain and text/html sections. The latter MIME decorations
# are plain missing in what _GetMessageText() returns. If we leave
***************
*** 530,535 ****
--- 532,547 ----
# considers the body to be text/plain (the default), and so it
# does get tokenized.
+ #
+ # Problem #2: Outlook decodes quoted-printable and base64 on its
+ # own, but leaves any Content-Transfer-Encoding line in the headers.
+ # This can cause the email pkg to try to decode the text again,
+ # with unpleasant (but rarely fatal) results. If we strip that
+ # header too, no problem -- although the fact that a msg was
+ # encoded in base64 is usually a good spam clue, and we miss that.
+ #
# Short course: we either have to synthesize non-insane MIME
# structure, or eliminate all evidence of original MIME structure.
+ # Since we don't have a way to the former, by default this function
+ # does the latter.
import email
text = self._GetMessageText()
***************
*** 540,546 ****
raise
! if strip_content_type:
if msg.has_key('content-type'):
del msg['content-type']
return msg
--- 552,560 ----
raise
! if strip_mime_headers:
if msg.has_key('content-type'):
del msg['content-type']
+ if msg.has_key('content-transfer-encoding'):
+ del msg['content-transfer-encoding']
return msg
More information about the Spambayes-checkins
mailing list