[Python-3000] email libraries: use byte or unicode strings?

Barry Warsaw barry at python.org
Wed Nov 5 21:38:23 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 30, 2008, at 6:17 PM, Andrew McNamara wrote:

> That's a tricker case, but I think it should use bytes internally.  
> One of
> the early goals of email was that be able to cope with malformed  
> MIME -
> this includes incorrectly encoded messages. So I think it must keep a
> bytes representation internally.
>
> However - charset encoding is part of the MIME spec, so users have a
> reasonable expectation that the mime lib will present them with  
> unicode.
> So the API needs to be unicode.
>
>> The latter doesn't though, and it needs a lot of work (we tried and  
>> failed
>> at pycon).
>
> Yes, it's hard. I think we're going to have to break the API.

I did make a start on a new API for email to work better with bytes  
and unicode.  I didn't get that far before other work intruded.  My  
current thinking is that you need separate APIs where appropriate to  
access email content as unicodes (or decoded data in general).  For  
example, normally headers and their values would be bytes, but there  
would be an API to retrieve the decoded values as unicodes.

Similarly, where get_payload() now takes a 'decoded' option, there  
would be a separate API for retrieving the decoded payload.  This is a  
bit trickier because depending on the content-type, you might want a  
unicode, or an image, or a sound file, etc.

Another tricky issue is how to set these things.  We have to get in  
the habit of writing

     message[b'Subject'] = b'Hello'

but that's really gross, and of course email_from_string() would have  
to become email_from_bytes().  Maybe the API accepts unicode strings  
but only if they are ASCII?

There are lots of other problems with the email package, and while  
it's made my life much better on the whole, it is definitely in need  
of improvement.  Unfortunately, I don't see myself having much time to  
attack it in the near future.  Maybe we can make it a Pycon sprint  
(instead of spending all that time on the bzr experiment ;), or, if  
someone else wants to lead the dirty work, I would definitely pitch in  
with my thoughts on API and implementation.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSRIEQHEjvBPtnXfVAQI9TQQAjcCPSUH9RNazXR6vaCHRLauSF9x4RPzE
8odPKLamPpea3kPS9OvGzSs3JtRwSQ8ozbd42MkovlexT7nEcHSZRfvQJNC8scPS
sjEuqyVIdKb9ls1SaZsuK7cZBaKM9OZP3qjvsnDOIICJu9wIpiyvYbhocVq2Yl9g
CNO6rIUU+8k=
=IT8J
-----END PGP SIGNATURE-----


More information about the Python-3000 mailing list