[Python-3000] email libraries: use byte or unicode strings?

Nick Coghlan ncoghlan at gmail.com
Thu Nov 6 13:22:42 CET 2008


Glenn Linderman wrote:
> Even 8-bit binary can be translated into a
> sequence of Unicode codepoints with the same numeric value, for example.

No, no, no, no. Using latin-1 to tunnel binary data through Unicode just
gets us straight back into the "is it text or bytes?" hell that is the
8-bit string in 2.x. It defeats the entire point of making the break
between str and bytes in 3.0 in the first place.

If something is potentially arbitrary binary data, we need to treat it
that way and use bytes. People are just going to have to get over their
aesthetic objections to the leading b on their bytes literals. Heck, be
happy you don't have to write bytes(map(ord, 'literal')) as was the case
in the early stages of 3.0 :)

Providing a Unicode based text API over the top for the cases where
handling malformed data isn't necessary may be convenient and a good
idea, but it shouldn't be the only API (3.0 is already guilty of that in
a few places - we shouldn't be adding more).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------


More information about the Python-3000 mailing list