[Email-SIG] email package status in 3.X

Thu Jun 10 15:21:52 CEST 2010

Thanks, David; that's great news.  I'll update the book draft 
accordingly.

For the record, despite the issues, I was able to complete a fairly
full-featured email client GUI with the email package as it currently
is.  This includes parsing and generating arbitrary attachments, as
well as encoding on sends and decoding on fetches for both text payloads 
and I18N mail headers. The package is still quite powerful as is.  It
does take a bit of digging to figure out how to use its many tools,
but the book will probably help on this front, especially the 
upcoming edition's more complete application.

In other words, some of my concern may have been a bit premature.  
I hope that in the future we'll either strive for compatibility 
or keep the current version around; it's a lot of very useful code.

In fact, I recommend that any new email package be named distinctly, 
and that the current package be retained for a number of releases to
come.  After all the breakages that 3.X introduced in general, doing
the same to any email-based code seems a bit too much, especially 
given that the current package is largely functional as is.  To me,
after having just used it extensively, fixing its few issues seems 
a better approach than starting from scratch.

As far as other issues, the things I found are described below my
signature.  I don't know what the utf-8 issue is that you refer 
too; I'm able to parse and send with this encoding as is without 
problems (both payloads and headers), but I'm probably not using the
interfaces you fixed, and this may be the same as one of item listed.

Another thought: it might be useful to use the book's email client 
as a sort of test case for the package; it's much more rigorous in 
the new edition because it now has to be given 3.X'Unicode model 
(it's abut 4,900 lines of code, though not all is email-related).
I'd be happy to donate the code as soon as I find out what the 
copyright will be this time around; it will be at O'Reilly's site
this Fall in any event.

Thanks,
--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)

Major issues I found...
------------------------------------------------------------------
1) Str required for parsing, but bytes returned from poplib

The initial decode from bytes to str of full mail text; in 
retrospect, probably not a major issue, since original email 
standards called for ASCII.  A 8-bit encoding like Latin-1 is
probably sufficient for most conforming mails.  For the book,
I try a set of different encodings, beginning with an optional
configuration module setting, then ascii, latin-1, and utf-8;
this is probably overkill, but a GUI has to be defensive.

----------------------------------------------------------------

2) Binary attachments encoding

The binary attachments byte-to-str issue that you've just
fixed.  As I mentioned, I worked around this by passing in a 
custom encoder that calls the original and runs an extra decode
step.  Here's what my fix looked like in the book; your patch 
may do better, and I will minimally add a note about the 3.1.3
and 3.2 fix for this:

def fix_encode_base64(msgobj):
     from email.encoders import encode_base64
     encode_base64(msgobj)                # what email does normally: leaves bytes
     bytes = msgobj.get_payload()         # bytes fails in email pkg on text gen
     text  = bytes.decode('ascii')        # decode to unicode str so text gen works
     ...plus line splitting logic omitted...
     msgobj.set_payload('\n'.join(lines))

>>> from email.mime.image import MIMEImage 
>>> from mailtools.mailSender import fix_encode_base64      # use custom workaround
>>> bytes = open('monkeys.jpg', 'rb').read()
>>> m = MIMEImage(bytes, _encoder=fix_encode_base64)        # convert to ascii str
>>> print(m.as_string()[:500])

-------------------------------------------------------------------

3) Type-dependent text part encoding

There's a str/bytes confusion issue related to Unicode encodings
in text payload generation: some encodings require the payload to
be str, but others expect bytes.  Unfortunately, this means that 
clients need to know how the package will react to the encoding 
that is used, and special-case based upon that.  

For example, I needed to pass in str for ASCII and Latin-1 (the 
former is unencoded and the latter gets QP MIME treatment), but 
must pass a bytes for UTF-8 (which triggers Base64).  That's less
than ideal for a client trying to attach arbitrary text parts 
generically from filenames.  Here's the obscure workaround I came
up with; the bodytext is str when fetched from an edit window, 
but may also be loaded from an attachment file.  This may or may
not have been reported, and it's entirley possible that there's
a better solution that I've missed.

def fix_text_required(encodingname):
    """
    4E: workaround for str/bytes combinaton errors in email package;  MIMEText 
    requires different types for different Unicode encodings in Python 3.1, due
    to the different ways it MIME-encodes some types of text;  see Chapter 13;
    the only other alternative is using generic Message and repeating much code; 
    """ 
    from email.charset import Charset, BASE64, QP
    charset = Charset(encodingname)   # how email knows what to do for encoding
    bodyenc = charset.body_encoding   # utf8, others require bytes input data
    return bodyenc in (None, QP)      # ascii, latin1, others require str

# on mail sends...
# email needs either str xor bytes specifically; 
if fix_text_required(bodytextEncoding): 
    if not isinstance(bodytext, str):
        bodytext = bodytext.decode(bodytextEncoding)
else:
    if not isinstance(bodytext, bytes):
        bodytext = bodytext.encode(bodytextEncoding)

# later
msg.set_payload(bodytext, charset=bodytextEncoding)
...or...
msg = MIMEText(bodytext, _charset=bodytextEncoding)
mainmsg.attach(msg)

# attachments
# build sub-Message of appropriate kind
maintype, subtype = contype.split('/', 1)
if maintype == 'text':                       # 4E: text needs encoding
    if fix_text_required(fileencode):        # requires str or bytes
        data = open(filename, 'r', encoding=fileencode)
    else:
        data = open(filename, 'rb')
    msg = MIMEText(data.read(), _subtype=subtype, _charset=fileencode)
    data.close()

-------------------------------------------------------------------

There are some additional cases that now require decoding per mail 
headers today due to the str/bytes split, but these are just a 
normal artifact of supporting Unicode character sets in general,
ans seem like issues for package client to resolve (e.g., the bytes 
returned for decoded payloads in 3.X didn't play well with existing 
str-based text processing code written for 2.X).

-------------------------------------------------------------------

-----Original Message-----
>From: "R. David Murray" <rdmurray at bitdance.com>
>Sent: Jun 4, 2010 12:39 PM
>To: lutz at rmi.net
>Cc: email-sig at python.org
>Subject: email package status in 3.X
>
>On Mon May 10 20:02:46 CEST 2010 Mark Lutz wrote:
>> I'm probably going to have to go ahead and finish the book
>> with the email package as it is now, and include a lot of 
>> caveats about the problems that a new version may fix in the 
>> future.  I can also post updated example code if/when possible.
>> 
>> I realize everybody on this list probably knows this already,
>> but email in 3.X not only doesn't support the Unicode/bytes 
>> dichotomy, it was also broken by it.  Beyond the pre-parse 
>> decode issue, its mail text generation really only works for 
>> all-text mails.  Generating text of an email with any sort of
>> binary part doesn't work at all now, because the base64 text 
>> is still bytes, and the Generator expects str.  I've coded a 
>> custom encoder to pass to MIMEImage that works around this
>> by decoding to ASCII, but it's not a great story to have to 
>> tell the tens of thousands of readers of this book, many of
>> whom will be evaluating 3.X in general.
>
>This bug should now be fixed in both the py3k branch and the 3.1
>maint branch.  This means the fix will be in 3.1.3, as well as 3.2a1.
>Hopefully that will be in time for your book, since 3.2a1 is due June
>27th and I'm guessing the 3.1.3 release will be some time not too far
>off that time frame as well.  FYI I also fixed a related bug that made
>using utf-8 as a charset problematic.  Unfortunately I suspect there
>maybe some other charset issues waiting to be discovered.
>
>If you have come across any other bugs that don't already have
>issues in the tracker please file bug reports.  Anything that
>can be fixed in the current package I will endeavor to fix
>before the next release.  Feel free also to indicate bugs which
>should be given priority.
>
>--
>R. David Murray                                      www.bitdance.com