[Python-3000] email libraries: use byte or unicode strings?
barry at python.org
Thu Nov 6 18:17:31 CET 2008
-----BEGIN PGP SIGNED MESSAGE-----
On Nov 6, 2008, at 7:22 AM, Nick Coghlan wrote:
> Glenn Linderman wrote:
>> Even 8-bit binary can be translated into a
>> sequence of Unicode codepoints with the same numeric value, for
> No, no, no, no. Using latin-1 to tunnel binary data through Unicode
> gets us straight back into the "is it text or bytes?" hell that is the
> 8-bit string in 2.x. It defeats the entire point of making the break
> between str and bytes in 3.0 in the first place.
And I'll note that this is essentially how the email package in 3.0
cheats its way into some modicum of usability. It is teh suck, but it
works (defined as "passes the tests" ;).
> If something is potentially arbitrary binary data, we need to treat it
> that way and use bytes. People are just going to have to get over
> aesthetic objections to the leading b on their bytes literals. Heck,
> happy you don't have to write bytes(map(ord, 'literal')) as was the
> in the early stages of 3.0 :)
> Providing a Unicode based text API over the top for the cases where
> handling malformed data isn't necessary may be convenient and a good
> idea, but it shouldn't be the only API (3.0 is already guilty of
> that in
> a few places - we shouldn't be adding more).
Right, and really it's a deeper issue. We're really only concerned
with bytes vs. unicodes in headers. When talking about payloads, we
get into a much more rich type hierarchy, with images, audio, byte
streams, etc, etc. Message.get_payload(decode=True) doesn't know
anything about that stuff, but it could.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
-----END PGP SIGNATURE-----
More information about the Python-3000