[Email-SIG] email package status in 3.X
lutz at rmi.net
lutz at rmi.net
Thu Jun 10 15:21:52 CEST 2010
Thanks, David; that's great news. I'll update the book draft
accordingly.
For the record, despite the issues, I was able to complete a fairly
full-featured email client GUI with the email package as it currently
is. This includes parsing and generating arbitrary attachments, as
well as encoding on sends and decoding on fetches for both text payloads
and I18N mail headers. The package is still quite powerful as is. It
does take a bit of digging to figure out how to use its many tools,
but the book will probably help on this front, especially the
upcoming edition's more complete application.
In other words, some of my concern may have been a bit premature.
I hope that in the future we'll either strive for compatibility
or keep the current version around; it's a lot of very useful code.
In fact, I recommend that any new email package be named distinctly,
and that the current package be retained for a number of releases to
come. After all the breakages that 3.X introduced in general, doing
the same to any email-based code seems a bit too much, especially
given that the current package is largely functional as is. To me,
after having just used it extensively, fixing its few issues seems
a better approach than starting from scratch.
As far as other issues, the things I found are described below my
signature. I don't know what the utf-8 issue is that you refer
too; I'm able to parse and send with this encoding as is without
problems (both payloads and headers), but I'm probably not using the
interfaces you fixed, and this may be the same as one of item listed.
Another thought: it might be useful to use the book's email client
as a sort of test case for the package; it's much more rigorous in
the new edition because it now has to be given 3.X'Unicode model
(it's abut 4,900 lines of code, though not all is email-related).
I'd be happy to donate the code as soon as I find out what the
copyright will be this time around; it will be at O'Reilly's site
this Fall in any event.
Thanks,
--Mark Lutz (http://learning-python.com, http://rmi.net/~lutz)
Major issues I found...
------------------------------------------------------------------
1) Str required for parsing, but bytes returned from poplib
The initial decode from bytes to str of full mail text; in
retrospect, probably not a major issue, since original email
standards called for ASCII. A 8-bit encoding like Latin-1 is
probably sufficient for most conforming mails. For the book,
I try a set of different encodings, beginning with an optional
configuration module setting, then ascii, latin-1, and utf-8;
this is probably overkill, but a GUI has to be defensive.
----------------------------------------------------------------
2) Binary attachments encoding
The binary attachments byte-to-str issue that you've just
fixed. As I mentioned, I worked around this by passing in a
custom encoder that calls the original and runs an extra decode
step. Here's what my fix looked like in the book; your patch
may do better, and I will minimally add a note about the 3.1.3
and 3.2 fix for this:
def fix_encode_base64(msgobj):
from email.encoders import encode_base64
encode_base64(msgobj) # what email does normally: leaves bytes
bytes = msgobj.get_payload() # bytes fails in email pkg on text gen
text = bytes.decode('ascii') # decode to unicode str so text gen works
...plus line splitting logic omitted...
msgobj.set_payload('\n'.join(lines))
>>> from email.mime.image import MIMEImage
>>> from mailtools.mailSender import fix_encode_base64 # use custom workaround
>>> bytes = open('monkeys.jpg', 'rb').read()
>>> m = MIMEImage(bytes, _encoder=fix_encode_base64) # convert to ascii str
>>> print(m.as_string()[:500])
-------------------------------------------------------------------
3) Type-dependent text part encoding
There's a str/bytes confusion issue related to Unicode encodings
in text payload generation: some encodings require the payload to
be str, but others expect bytes. Unfortunately, this means that
clients need to know how the package will react to the encoding
that is used, and special-case based upon that.
For example, I needed to pass in str for ASCII and Latin-1 (the
former is unencoded and the latter gets QP MIME treatment), but
must pass a bytes for UTF-8 (which triggers Base64). That's less
than ideal for a client trying to attach arbitrary text parts
generically from filenames. Here's the obscure workaround I came
up with; the bodytext is str when fetched from an edit window,
but may also be loaded from an attachment file. This may or may
not have been reported, and it's entirley possible that there's
a better solution that I've missed.
def fix_text_required(encodingname):
"""
4E: workaround for str/bytes combinaton errors in email package; MIMEText
requires different types for different Unicode encodings in Python 3.1, due
to the different ways it MIME-encodes some types of text; see Chapter 13;
the only other alternative is using generic Message and repeating much code;
"""
from email.charset import Charset, BASE64, QP
charset = Charset(encodingname) # how email knows what to do for encoding
bodyenc = charset.body_encoding # utf8, others require bytes input data
return bodyenc in (None, QP) # ascii, latin1, others require str
# on mail sends...
# email needs either str xor bytes specifically;
if fix_text_required(bodytextEncoding):
if not isinstance(bodytext, str):
bodytext = bodytext.decode(bodytextEncoding)
else:
if not isinstance(bodytext, bytes):
bodytext = bodytext.encode(bodytextEncoding)
# later
msg.set_payload(bodytext, charset=bodytextEncoding)
...or...
msg = MIMEText(bodytext, _charset=bodytextEncoding)
mainmsg.attach(msg)
# attachments
# build sub-Message of appropriate kind
maintype, subtype = contype.split('/', 1)
if maintype == 'text': # 4E: text needs encoding
if fix_text_required(fileencode): # requires str or bytes
data = open(filename, 'r', encoding=fileencode)
else:
data = open(filename, 'rb')
msg = MIMEText(data.read(), _subtype=subtype, _charset=fileencode)
data.close()
-------------------------------------------------------------------
There are some additional cases that now require decoding per mail
headers today due to the str/bytes split, but these are just a
normal artifact of supporting Unicode character sets in general,
ans seem like issues for package client to resolve (e.g., the bytes
returned for decoded payloads in 3.X didn't play well with existing
str-based text processing code written for 2.X).
-------------------------------------------------------------------
-----Original Message-----
>From: "R. David Murray" <rdmurray at bitdance.com>
>Sent: Jun 4, 2010 12:39 PM
>To: lutz at rmi.net
>Cc: email-sig at python.org
>Subject: email package status in 3.X
>
>On Mon May 10 20:02:46 CEST 2010 Mark Lutz wrote:
>> I'm probably going to have to go ahead and finish the book
>> with the email package as it is now, and include a lot of
>> caveats about the problems that a new version may fix in the
>> future. I can also post updated example code if/when possible.
>>
>> I realize everybody on this list probably knows this already,
>> but email in 3.X not only doesn't support the Unicode/bytes
>> dichotomy, it was also broken by it. Beyond the pre-parse
>> decode issue, its mail text generation really only works for
>> all-text mails. Generating text of an email with any sort of
>> binary part doesn't work at all now, because the base64 text
>> is still bytes, and the Generator expects str. I've coded a
>> custom encoder to pass to MIMEImage that works around this
>> by decoding to ASCII, but it's not a great story to have to
>> tell the tens of thousands of readers of this book, many of
>> whom will be evaluating 3.X in general.
>
>This bug should now be fixed in both the py3k branch and the 3.1
>maint branch. This means the fix will be in 3.1.3, as well as 3.2a1.
>Hopefully that will be in time for your book, since 3.2a1 is due June
>27th and I'm guessing the 3.1.3 release will be some time not too far
>off that time frame as well. FYI I also fixed a related bug that made
>using utf-8 as a charset problematic. Unfortunately I suspect there
>maybe some other charset issues waiting to be discovered.
>
>If you have come across any other bugs that don't already have
>issues in the tracker please file bug reports. Anything that
>can be fixed in the current package I will endeavor to fix
>before the next release. Feel free also to indicate bugs which
>should be given priority.
>
>--
>R. David Murray www.bitdance.com
More information about the Email-SIG
mailing list