Scrubber mungs quoted-printable revisited.

To refresh, see <http://mail.python.org/pipermail/mailman-developers/2005-November/018395.htm...> and <http://mail.python.org/pipermail/mailman-developers/2005-December/018449.htm...>.
The problem with set_payload() creating a message for which a subsequent get_payload did 'too much' decoding was fixed by having Scrubber.py add a 'X-Mailman-Scrubbed: Yes' header upon doing a set_payload(), and then in various places where there are subsequent get_payload() calls, setting the decode flag according to the presence or absence of the header.
I actually like the header as an explanation of content change along the lines of X-Content-Filtered-By:, but I've been uneasy about setting the get_payload() decode flag based on the presence or absence of 'X-Mailman-Scrubbed: Yes', although it seems to work in all cases we've tried.
Recently, I looked in more detail at the actual set_payload() method in the email library and I have at least a vague understanding of the problem. The problem and my understanding are reported at <http://sourceforge.net/tracker/?func=detail&aid=1409455&group_id=5470&atid=105470>. I have suggested a patch there which I call a 'Hint at possible fix'. This patch could be applied in Scrubber.py.
The patch to Scrubber.py would add to the end of the
def replace_payload_by_text(msg, text, charset):
definition making the whole definition
def replace_payload_by_text(msg, text, charset): # TK: This is a common function in replacing the attachment and the main # message by a text (scrubbing). Also, add a flag indicating it has been # scrubbed. del msg['content-type'] del msg['content-transfer-encoding'] msg.set_payload(text, charset) msg['X-Mailman-Scrubbed'] = 'Yes' if msg.get('content-transfer-encoding') == 'quoted-printable': cset = msg.get_charset() if cset: msg._payload = cset.body_encode(msg._payload) msg._charset = None
The advantage to doing this in Scrubber.py and unconditionally setting the decode flag for subsequent get_payload() calls is it makes the whole process insensitive to whether or not or when Python email bug # 1409455 is fixed. If the bug is fixed, the payload will be encoded and msg.get_charset() above will return None so the payload won't be encoded a second time. The additional code above can be removed at some point after we're sure the email library used by Mailman is fixed.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi Mark,
Mark Sapiro wrote:
It's nice that the problem was tracked down. Thank you!
I'm rather uneasy with _varname attribute is manupulated everywhere in the application code. Maybe we should override the email.Message behaviour by overriding in Mailman.Message. Also I will add 'base64' in encoding check and back out the patches related to the 'X-Mailman-Scruber'.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Hi Mark,
Mark Sapiro wrote:
It's nice that the problem was tracked down. Thank you!
I'm rather uneasy with _varname attribute is manupulated everywhere in the application code. Maybe we should override the email.Message behaviour by overriding in Mailman.Message. Also I will add 'base64' in encoding check and back out the patches related to the 'X-Mailman-Scruber'.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
participants (2)
-
Mark Sapiro
-
Tokio Kikuchi