[ python-Bugs-1409455 ] email.Message.set_payload followed by bad result get_payload

Mon Feb 6 04:42:07 CET 2006

Bugs item #1409455, was opened at 2006-01-18 17:09
Message generated for change (Comment added) made by bwarsaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1409455&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Sapiro (msapiro)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email.Message.set_payload followed by bad result get_payload

Initial Comment:
Under certain circumstances, in particular when charset
is 'iso-8859-1', where msg is an email.Message() instance,

    msg.set_payload(text, charset)

'apparently' encodes the text as quoted-printable and
adds a

Content-Transfer-Encoding: quoted-printable

header to msg. I say 'apparently' because if one prints
msg or creates a Generator instance and writes msg to a
file, the message is printed/written as a correct,
quoted-printable encoded message, but

    text = msg._payload
or

    text = msg.get_payload()

gives the original text, not quoted-printable encoded, and

    text = msg.get_payload(decode=True)

gives a quoted-printable decoding of the original text
which is munged if the original text included '=' in
some ways.

This is causing problems in Mailman which are currently
worked around by flagging if the payload was set by
set_payload() and not subsequently 'decoding' in that
case, but it would be better if
set_payload()/get_payload() worked properly.

A script is attached which illustrates the problem.

----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2006-02-05 22:42

Message:
Logged In: YES 
user_id=12800

See the attached patch for what I think is ultimately the
right fix.  The idea is that when set_payload() is called,
the payload is immediately encoded so that get_payload()
will do the right thing.  Also, Generator.py has to be fixed
to not doubly encode the payload.

Run against your example, it seems to DTRT.  It also passes
all but one of the email pkg unit tests.  The one failure
is, I believe due to an incorrect test.  The patch includes
a fix for that as well as adding a test for
get_payload(decode=True).

I'd like to get some feedback from the email-sig before
applying this, but it seems right to me.

----------------------------------------------------------------------

Comment By: Mark Sapiro (msapiro)
Date: 2006-01-20 18:19

Message:
Logged In: YES 
user_id=1123998

I've looked at the email library and I see the problem.
msg.set_payload() never QP encodes msg._payload. When the
message is stringified or flattened by a generator, the
generator's _handle_text() method does the encoding and it
is msg._charset that signals the need to do this. Thus when
the message object is ultimately converted to a suitable
external form, the body is QP encoded, but internally it
never is. Thus, subsequent msg.get_payload() calls return
unexpected results.

It appears (from minimal testing) that when a text message
is parsed into an email.Message.Message instance, _charset
is None even if there is a character set specification in a
Content-Type: header.

I have attached a patch (Message.py.patch.txt) which may fix
the problem. It has only been tested against the already
attached example.py so it is really untested. Also, it only
addresses the quoted-printable case. I haven't even thought
about whether there might be a similar problem involving base64.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1409455&group_id=5470