
A couple of posts on the mailman-users list (<http://mail.python.org/pipermail/mailman-users/2005-September/046400.html> and <http://mail.python.org/pipermail/mailman-users/2005-October/047367.html>) have pointed out that there is a problem with Scrubber.py. The basic issue is that under some circumstances, the scrubbed message is quoted-printable encoded, but Scrubber unconditionally adds a 'Content-Transfer-Encoding: 8bit' header resulting in garbled content when the message is viewed. The addition of the header was a small part of the patch at <http://sourceforge.net/tracker/index.php?func=detail&aid=655214&group_id=103&atid=300103> and the associated comment is - Fixes a bug in the scrubber, where a content-transfer-encoding might have survived flattening of the message. I think the fix for the current problem is the following patch - --- mailman-2.1.6/Mailman/Handlers/Scrubber.py +++ mailman-mas/Mailman/Handlers/Scrubber.py @@ -376,9 +376,8 @@ # Now join the text and set the payload sep = _('-------------- next part --------------\n') del msg['content-type'] - msg.set_payload(sep.join(text), charset) del msg['content-transfer-encoding'] - msg.add_header('Content-Transfer-Encoding', '8bit') + msg.set_payload(sep.join(text), charset) return msg I have checked this with Tokio, and he agrees, but I don't have any message that would have triggered the original bug to test against nor do I know what the characteristics of such a message would be or if subsequent changes in the Python email library have taken care of it. Thus, before committing this patch, I'd like to see it get a bit more exposure/testing and/or get feedback from someone who knows something about the original bug. -- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro wrote:
I still think this is the correct fix, but it turns out there are some tricky issues here that I believe come down to an error in the set_payload() method. Under certain circumstances, in particular when charset is 'iso-8859-1', msg.set_payload(text, charset) 'apparently' encodes the text as quoted-printable and adds a Content-Transfer-Encoding: quoted-printable header to msg. I say 'apparently' because if one prints msg or creates a Generator instance and writes msg to a file, the message is printed/written as a correct, quoted-printable encoded message, but text = msg._payload or text = msg.get_payload() gives the original text, not quoted-printable encoded, and text = msg.get_payload(decode=1) gives a quoted-printable decoding of the original text which is munged if the original text included '=' in some ways. This is a problem for Mailman because if Scrubber is processing individual messages, the 'apparently' quoted-printable message gets passed ultimately to SMTPDirect which calls Decorate, and Decorate does msg.get_payload(decode=1) when adding the header and/or footer and can mung the message in the process. There is also an issue with archiving when the archiver gets a multipart message which is subsequently flattened by Scrubber. The following is a transcript of a Python interactive session that illustrates the above problems with set_payload() and get_payload(). This session is with Python 2.4.1, but exactly the same behavior occurs with 2.3.4 and 2.4.2. Python 2.4.1 (#1, May 27 2005, 18:02:40) [GCC 3.3.3 (cygwin special)] on cygwin Type "help", "copyright", "credits" or "license" for more information.
How about just a line of stuff with some ==== and a few words. X=91**2 (x is 91 squared)
How about just a line of stuff with some =3D=3D=3D=3D and a few words. X=3D91**2 (x is 91 squared)
print msg.get_payload()
How about just a line of stuff with some ==== and a few words. X=91**2 (x is 91 squared)
print msg.get_payload(decode=1)
How about just a line of stuff with some == and a few words. X`**2 (x is 91 squared) -- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi Mark, I once used this patch for japanese mailman: This re-generation was rejected by Barry because this may impose heavy load (?). This hack should simplify the charset gotcha just below the patched lines. Or, we may have to introduce a new variable to keep watch if the payload is decoded or not in email.Message.Message class. IMHO, mailing list messages should be in plain text without attachments and those who attach should pay (the load) for it. --- Scrubber.py.orig Thu Dec 1 10:01:45 2005 +++ Scrubber.py Thu Dec 1 10:13:17 2005 @@ -28,6 +28,7 @@ from cStringIO import StringIO from types import IntType, StringType +from email import message_from_string from email.Utils import parsedate from email.Parser import HeaderParser from email.Generator import Generator @@ -313,6 +314,9 @@ Url : %(url)s """), lcset) outer = False + # Re-generation of message instance from stringfied one. + # This should normalize the payloads. + msg = message_from_string(msg.as_string()) # We still have to sanitize multipart messages to flat text because # Pipermail can't handle messages with list payloads. This is a kludge; # def (n) clever hack ;). Mark Sapiro wrote:
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Hi Mark,
I've just committed my patch to this problem in 2.1 CVS branch. I used this example and looks OK for both with and without Content-Transfer-Encoding header.
Mark Sapiro wrote:
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Mark Sapiro wrote:
I still think this is the correct fix, but it turns out there are some tricky issues here that I believe come down to an error in the set_payload() method. Under certain circumstances, in particular when charset is 'iso-8859-1', msg.set_payload(text, charset) 'apparently' encodes the text as quoted-printable and adds a Content-Transfer-Encoding: quoted-printable header to msg. I say 'apparently' because if one prints msg or creates a Generator instance and writes msg to a file, the message is printed/written as a correct, quoted-printable encoded message, but text = msg._payload or text = msg.get_payload() gives the original text, not quoted-printable encoded, and text = msg.get_payload(decode=1) gives a quoted-printable decoding of the original text which is munged if the original text included '=' in some ways. This is a problem for Mailman because if Scrubber is processing individual messages, the 'apparently' quoted-printable message gets passed ultimately to SMTPDirect which calls Decorate, and Decorate does msg.get_payload(decode=1) when adding the header and/or footer and can mung the message in the process. There is also an issue with archiving when the archiver gets a multipart message which is subsequently flattened by Scrubber. The following is a transcript of a Python interactive session that illustrates the above problems with set_payload() and get_payload(). This session is with Python 2.4.1, but exactly the same behavior occurs with 2.3.4 and 2.4.2. Python 2.4.1 (#1, May 27 2005, 18:02:40) [GCC 3.3.3 (cygwin special)] on cygwin Type "help", "copyright", "credits" or "license" for more information.
How about just a line of stuff with some ==== and a few words. X=91**2 (x is 91 squared)
How about just a line of stuff with some =3D=3D=3D=3D and a few words. X=3D91**2 (x is 91 squared)
print msg.get_payload()
How about just a line of stuff with some ==== and a few words. X=91**2 (x is 91 squared)
print msg.get_payload(decode=1)
How about just a line of stuff with some == and a few words. X`**2 (x is 91 squared) -- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi Mark, I once used this patch for japanese mailman: This re-generation was rejected by Barry because this may impose heavy load (?). This hack should simplify the charset gotcha just below the patched lines. Or, we may have to introduce a new variable to keep watch if the payload is decoded or not in email.Message.Message class. IMHO, mailing list messages should be in plain text without attachments and those who attach should pay (the load) for it. --- Scrubber.py.orig Thu Dec 1 10:01:45 2005 +++ Scrubber.py Thu Dec 1 10:13:17 2005 @@ -28,6 +28,7 @@ from cStringIO import StringIO from types import IntType, StringType +from email import message_from_string from email.Utils import parsedate from email.Parser import HeaderParser from email.Generator import Generator @@ -313,6 +314,9 @@ Url : %(url)s """), lcset) outer = False + # Re-generation of message instance from stringfied one. + # This should normalize the payloads. + msg = message_from_string(msg.as_string()) # We still have to sanitize multipart messages to flat text because # Pipermail can't handle messages with list payloads. This is a kludge; # def (n) clever hack ;). Mark Sapiro wrote:
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Hi Mark,
I've just committed my patch to this problem in 2.1 CVS branch. I used this example and looks OK for both with and without Content-Transfer-Encoding header.
Mark Sapiro wrote:
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
participants (3)
-
Mark Sapiro
-
msapiro@value.net
-
Tokio Kikuchi