
Hi Mark, I once used this patch for japanese mailman: This re-generation was rejected by Barry because this may impose heavy load (?). This hack should simplify the charset gotcha just below the patched lines. Or, we may have to introduce a new variable to keep watch if the payload is decoded or not in email.Message.Message class. IMHO, mailing list messages should be in plain text without attachments and those who attach should pay (the load) for it. --- Scrubber.py.orig Thu Dec 1 10:01:45 2005 +++ Scrubber.py Thu Dec 1 10:13:17 2005 @@ -28,6 +28,7 @@ from cStringIO import StringIO from types import IntType, StringType +from email import message_from_string from email.Utils import parsedate from email.Parser import HeaderParser from email.Generator import Generator @@ -313,6 +314,9 @@ Url : %(url)s """), lcset) outer = False + # Re-generation of message instance from stringfied one. + # This should normalize the payloads. + msg = message_from_string(msg.as_string()) # We still have to sanitize multipart messages to flat text because # Pipermail can't handle messages with list payloads. This is a kludge; # def (n) clever hack ;). Mark Sapiro wrote:
Mark Sapiro wrote:
I think the fix for the current problem is the following patch -
--- mailman-2.1.6/Mailman/Handlers/Scrubber.py +++ mailman-mas/Mailman/Handlers/Scrubber.py @@ -376,9 +376,8 @@ # Now join the text and set the payload sep = _('-------------- next part --------------\n') del msg['content-type'] - msg.set_payload(sep.join(text), charset) del msg['content-transfer-encoding'] - msg.add_header('Content-Transfer-Encoding', '8bit') + msg.set_payload(sep.join(text), charset) return msg
I still think this is the correct fix, but it turns out there are some tricky issues here that I believe come down to an error in the set_payload() method.
Under certain circumstances, in particular when charset is 'iso-8859-1',
msg.set_payload(text, charset)
'apparently' encodes the text as quoted-printable and adds a
Content-Transfer-Encoding: quoted-printable
header to msg. I say 'apparently' because if one prints msg or creates a Generator instance and writes msg to a file, the message is printed/written as a correct, quoted-printable encoded message, but
text = msg._payload or
text = msg.get_payload()
gives the original text, not quoted-printable encoded, and
text = msg.get_payload(decode=1)
gives a quoted-printable decoding of the original text which is munged if the original text included '=' in some ways.
This is a problem for Mailman because if Scrubber is processing individual messages, the 'apparently' quoted-printable message gets passed ultimately to SMTPDirect which calls Decorate, and Decorate does msg.get_payload(decode=1) when adding the header and/or footer and can mung the message in the process.
There is also an issue with archiving when the archiver gets a multipart message which is subsequently flattened by Scrubber.
The following is a transcript of a Python interactive session that illustrates the above problems with set_payload() and get_payload(). This session is with Python 2.4.1, but exactly the same behavior occurs with 2.3.4 and 2.4.2.
Python 2.4.1 (#1, May 27 2005, 18:02:40) [GCC 3.3.3 (cygwin special)] on cygwin Type "help", "copyright", "credits" or "license" for more information.
import email
msg = email.message_from_file(open('plain2.eml'))
print msg
From nobody Mon Nov 28 09:18:41 2005 From: "Mark Sapiro" <msapiro@value.net> To: list1@localhost Subject: HTML - all Date: Sun, 27 Nov 2005 09:02:33 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"
How about just a line of stuff with some ==== and a few words.
X=91**2 (x is 91 squared)
del msg['content-type'] del msg['content-transfer-encoding'] msg.set_payload(str(msg.get_payload()), 'iso-8859-1')
print msg
From nobody Mon Nov 28 09:18:41 2005 From: "Mark Sapiro" <msapiro@value.net> To: list1@localhost Subject: HTML - all Date: Sun, 27 Nov 2005 09:02:33 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
How about just a line of stuff with some =3D=3D=3D=3D and a few words.
X=3D91**2 (x is 91 squared)
print msg.get_payload()
How about just a line of stuff with some ==== and a few words.
X=91**2 (x is 91 squared)
print msg.get_payload(decode=1)
How about just a line of stuff with some == and a few words.
X`**2 (x is 91 squared)
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/