[Python-3000] Question about email/generator.py

Guido van Rossum guido at python.org
Tue Oct 23 21:36:16 CEST 2007


There's an issue in the email package that I can't resolve by myself.
I described it to Barry like this:

> > So in generator.py on line 291, I read:
> >
> >   print(part.get_payload(decode=True), file=self)
> >
> > It turns out that part.get_payload(decode=True) returns a bytes
> > object, and printing a bytes object to a text file is not the right
> > thing to do -- in 3.0a1 it silently just prints those bytes, in 3.0a2
> > it will probably print the repr() of the bytes object. Right now, it
> > errors out because I'm removing the encode() method on PyString
> > objects, and print() converts PyBytes to PyString; then the
> > TextIOWrapper.write() method tries to encode its argument.
> >
> > If I change this to (decode=False), all tests in the email package
> > pass. But is this the right fix???

I should note that this was checked in by the time Barry replied, even
though it clearly was the wrong thing to do. Barry replied:

> Maybe. ;)  The problem is that this API is either being too smart for
> its own good, or not smart enough.  The intent of decode=True is to
> return the original object encoded in the payload.  So for example,
> if MIMEImage was used to encode some jpeg, then decode=True should
> return that jpeg.
>
> The problem is that what you really want is something that's content-
> type aware, such that if your main type is some non-text type like
> image/* or audio/* or even application/octet-stream, you will almost
> always want a bytes object back.  But text can also be encoded via
> charset and/or transfer-encoding, and (at least in Py2.x), you'd use
> the same method to get the original, unencoded text back.  In that
> case, you definitely want the string, since that's the most natural
> API (i.e. you fed it a string object when you created the MIMEText,
> so you want a string on the way back out).
>
> This is yet another corner case where the old API doesn't really fit
> the new bytes/string model correctly, and of course you can
> (rightly!) argue we were sloppy in Py2.x but were able to (mostly)
> get away with it.
>
> In this /specific/ situation, generator.py:291 can only be called
> when the main type is text, so I think it is clearly expecting a
> string, even though .get_payload() will return a bytes there.
>
> Short of redesigning the API, I can think of two options.  First, we
> can change .get_payload() to specific return a string when the main
> type is text and decode=True.  This is ugly because the return type
> will depend on the content type of the message.  OTOH, get_payload()
> is already fairly ugly here because its return type differs based on
> its argument, although I'd like to split this into a
> separate .get_decoded_payload() method.
>
> The other option is to let .get_payload() return bytes in all cases,
> but in generator.py:291, explicitly convert it to a string, probably
> using raw-unicode-escape.  Because we know the main type is text
> here, we know that the payload must contain a string.  get_payload()
> will return the bytes of the decoded unicode string, so raw-unicode-
> escape should do the right thing.  That's ugly too for obvious reasons.
>
> The one thing that doesn't seem right is for decode=False to be used
> because should the payload be an encoded string, it won't get
> correctly decoded.  This is part of the DecodedGenerator, which
> honestly is probably not much used outside the test cases.  but the
> intent of that generator is clearly to print the decoded text parts
> with the non-text parts stripped and replaced by a placeholder.  So I
> think it definitely wants decoded text payloads, otherwise there's
> not much point in the class.
>
> I hope that explains the situation.  I'm open to any other idea -- it
> doesn't even have to be better. ;)  I see that you made the
> decode=False change in svn, but that's the one solution that doesn't
> seem right.

At this point I (Guido) am really hoping someone will want to "own"
this issue and redesign the API properly...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list