[Python-Dev] PEP 393 Summer of Code Project

Raymond Hettinger raymond.hettinger at gmail.com
Sat Aug 27 07:58:10 CEST 2011


On Aug 26, 2011, at 8:51 PM, Terry Reedy wrote:

> 
> 
> On 8/26/2011 8:42 PM, Guido van Rossum wrote:
>> On Fri, Aug 26, 2011 at 3:57 PM, Terry Reedy<tjreedy at udel.edu>  wrote:
> 
>>> My impression is that a UFT-16 implementation, to be properly called such,
>>> must do len and [] in terms of code points, which is why Python's narrow
>>> builds are called UCS-2 and not UTF-16.
>> 
>> I don't think anyone else has that impression. Please cite chapter and
>> verse if you really think this is important. IIUC, UCS-2 does not
>> allow surrogate pairs, whereas Python (and Java, and .NET, and
>> Windows) 16-bit strings all do support surrogate pairs. And they all
> 
> For that reason, I think UTF-16 is a better term that UCS-2 for narrow builds (whether or not the above impression is true).

I agree.  It's weird to call something UCS-2 if code points above 65535 are representable.
The naming convention for codecs is that the UTF prefix is used for lossless encodings that cover the entire range of Unicode.

"The first amendment to the original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP."

Raymond

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/981e8a99/attachment.html>


More information about the Python-Dev mailing list