[Python-Dev] PEP 393 Summer of Code Project

Stephen J. Turnbull stephen at xemacs.org
Mon Aug 29 05:43:24 CEST 2011


Raymond Hettinger writes:

 > The naming convention for codecs is that the UTF prefix is used for
 > lossless encodings that cover the entire range of Unicode.

Sure.  The operative word here is "codec", not "str", though.

 > "The first amendment to the original edition of the UCS defined
 > UTF-16, an extension of UCS-2, to represent code points outside the
 > BMP."

Since when can s[0] represent a code point outside the BMP, for s a
Unicode string in a narrow build?

Remember, the UCS-2/narrow vs. UCS-4/wide distinction is *not* about
what Python supports vs. the outside world.  It's about what the str/
unicode type is an array of.




More information about the Python-Dev mailing list