[Python-3000] Question about email/generator.py
Barry Warsaw
barry at python.org
Tue Oct 23 21:50:05 CEST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Oct 23, 2007, at 3:36 PM, Guido van Rossum wrote:
> There's an issue in the email package that I can't resolve by myself.
> I described it to Barry like this:
>
>>> So in generator.py on line 291, I read:
>>>
>>> print(part.get_payload(decode=True), file=self)
>>>
>>> It turns out that part.get_payload(decode=True) returns a bytes
>>> object, and printing a bytes object to a text file is not the right
>>> thing to do -- in 3.0a1 it silently just prints those bytes, in
>>> 3.0a2
>>> it will probably print the repr() of the bytes object. Right now, it
>>> errors out because I'm removing the encode() method on PyString
>>> objects, and print() converts PyBytes to PyString; then the
>>> TextIOWrapper.write() method tries to encode its argument.
>>>
>>> If I change this to (decode=False), all tests in the email package
>>> pass. But is this the right fix???
>
> I should note that this was checked in by the time Barry replied, even
> though it clearly was the wrong thing to do. Barry replied:
>
>> Maybe. ;) The problem is that this API is either being too smart for
>> its own good, or not smart enough. The intent of decode=True is to
>> return the original object encoded in the payload. So for example,
>> if MIMEImage was used to encode some jpeg, then decode=True should
>> return that jpeg.
>>
>> The problem is that what you really want is something that's content-
>> type aware, such that if your main type is some non-text type like
>> image/* or audio/* or even application/octet-stream, you will almost
>> always want a bytes object back. But text can also be encoded via
>> charset and/or transfer-encoding, and (at least in Py2.x), you'd use
>> the same method to get the original, unencoded text back. In that
>> case, you definitely want the string, since that's the most natural
>> API (i.e. you fed it a string object when you created the MIMEText,
>> so you want a string on the way back out).
>>
>> This is yet another corner case where the old API doesn't really fit
>> the new bytes/string model correctly, and of course you can
>> (rightly!) argue we were sloppy in Py2.x but were able to (mostly)
>> get away with it.
>>
>> In this /specific/ situation, generator.py:291 can only be called
>> when the main type is text, so I think it is clearly expecting a
>> string, even though .get_payload() will return a bytes there.
>>
>> Short of redesigning the API, I can think of two options. First, we
>> can change .get_payload() to specific return a string when the main
>> type is text and decode=True. This is ugly because the return type
>> will depend on the content type of the message. OTOH, get_payload()
>> is already fairly ugly here because its return type differs based on
>> its argument, although I'd like to split this into a
>> separate .get_decoded_payload() method.
>>
>> The other option is to let .get_payload() return bytes in all cases,
>> but in generator.py:291, explicitly convert it to a string, probably
>> using raw-unicode-escape. Because we know the main type is text
>> here, we know that the payload must contain a string. get_payload()
>> will return the bytes of the decoded unicode string, so raw-unicode-
>> escape should do the right thing. That's ugly too for obvious
>> reasons.
>>
>> The one thing that doesn't seem right is for decode=False to be used
>> because should the payload be an encoded string, it won't get
>> correctly decoded. This is part of the DecodedGenerator, which
>> honestly is probably not much used outside the test cases. but the
>> intent of that generator is clearly to print the decoded text parts
>> with the non-text parts stripped and replaced by a placeholder. So I
>> think it definitely wants decoded text payloads, otherwise there's
>> not much point in the class.
>>
>> I hope that explains the situation. I'm open to any other idea -- it
>> doesn't even have to be better. ;) I see that you made the
>> decode=False change in svn, but that's the one solution that doesn't
>> seem right.
>
> At this point I (Guido) am really hoping someone will want to "own"
> this issue and redesign the API properly...
I'm really bummed that I've had no time to work on this. Life and
work have imposed. I'd be willing to chat with someone about what I
think should happen. At this point irc or im might be best. :(
- -Barry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iQCVAwUBRx5Qb3EjvBPtnXfVAQIcbwP9FPa/IJpIg+D2y/FJJp0LRqXctGhXUssi
aDX8M07pHu9aMPXKvDYZw50NFcyx87mMjWNVf2gX1KjM+U5XUns3WwtU+C60ZBSn
gEUmzAaYJVhDWguRiOpCX/bR1F2U8dudDR0UC8wrV9Mylk/C4b/q7bUdrGeT8riK
+oSTcaKTatY=
=98W1
-----END PGP SIGNATURE-----
More information about the Python-3000
mailing list