Scrubber mungs quoted-printable revisited.

To refresh, see <http://mail.python.org/pipermail/mailman-developers/2005-November/018395.htm...> and <http://mail.python.org/pipermail/mailman-developers/2005-December/018449.htm...>.
The problem with set_payload() creating a message for which a subsequent get_payload did 'too much' decoding was fixed by having Scrubber.py add a 'X-Mailman-Scrubbed: Yes' header upon doing a set_payload(), and then in various places where there are subsequent get_payload() calls, setting the decode flag according to the presence or absence of the header.
I actually like the header as an explanation of content change along the lines of X-Content-Filtered-By:, but I've been uneasy about setting the get_payload() decode flag based on the presence or absence of 'X-Mailman-Scrubbed: Yes', although it seems to work in all cases we've tried.
Recently, I looked in more detail at the actual set_payload() method in the email library and I have at least a vague understanding of the problem. The problem and my understanding are reported at <http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470>. I have suggested a patch there which I call a 'Hint at possible fix'. This patch could be applied in Scrubber.py.
The patch to Scrubber.py would add to the end of the
def replace_payload_by_text(msg, text, charset):
definition making the whole definition
def replace_payload_by_text(msg, text, charset): # TK: This is a common function in replacing the attachment and the main # message by a text (scrubbing). Also, add a flag indicating it has been # scrubbed. del msg['content-type'] del msg['content-transfer-encoding'] msg.set_payload(text, charset) msg['X-Mailman-Scrubbed'] = 'Yes' if msg.get('content-transfer-encoding') == 'quoted-printable': cset = msg.get_charset() if cset: msg._payload = cset.body_encode(msg._payload) msg._charset = None
The advantage to doing this in Scrubber.py and unconditionally setting the decode flag for subsequent get_payload() calls is it makes the whole process insensitive to whether or not or when Python email bug # 1409455 is fixed. If the bug is fixed, the payload will be encoded and msg.get_charset() above will return None so the payload won't be encoded a second time. The additional code above can be removed at some point after we're sure the email library used by Mailman is fixed.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi Mark,
Mark Sapiro wrote:
Recently, I looked in more detail at the actual set_payload() method in the email library and I have at least a vague understanding of the problem. The problem and my understanding are reported at <http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470>. I have suggested a patch there which I call a 'Hint at possible fix'. This patch could be applied in Scrubber.py.
It's nice that the problem was tracked down. Thank you!
The patch to Scrubber.py would add to the end of the
def replace_payload_by_text(msg, text, charset):
definition making the whole definition
def replace_payload_by_text(msg, text, charset): # TK: This is a common function in replacing the attachment and the main # message by a text (scrubbing). Also, add a flag indicating it has been # scrubbed. del msg['content-type'] del msg['content-transfer-encoding'] msg.set_payload(text, charset) msg['X-Mailman-Scrubbed'] = 'Yes' if msg.get('content-transfer-encoding') == 'quoted-printable': cset = msg.get_charset() if cset: msg._payload = cset.body_encode(msg._payload) msg._charset = None
I'm rather uneasy with _varname attribute is manupulated everywhere in the application code. Maybe we should override the email.Message behaviour by overriding in Mailman.Message. Also I will add 'base64' in encoding check and back out the patches related to the 'X-Mailman-Scruber'.
The advantage to doing this in Scrubber.py and unconditionally setting the decode flag for subsequent get_payload() calls is it makes the whole process insensitive to whether or not or when Python email bug # 1409455 is fixed. If the bug is fixed, the payload will be encoded and msg.get_charset() above will return None so the payload won't be encoded a second time. The additional code above can be removed at some point after we're sure the email library used by Mailman is fixed.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
participants (2)
-
Mark Sapiro
-
Tokio Kikuchi