[Python-Dev] PEP 393 Summer of Code Project

Stephen J. Turnbull stephen at xemacs.org
Mon Aug 29 04:48:43 CEST 2011


Guido van Rossum writes:

 > I don't think anyone else has that impression. Please cite chapter and
 > verse if you really think this is important. IIUC, UCS-2 does not
 > allow surrogate pairs,

In the original definition of UCS-2 in draft ISO 10646 (1990),
everything in the BMP except for 0xFFFF and 0xFFFE was a character,
and there was no concept of "surrogate" at all.  Later in ISO 10646
(1993)[1], the Surrogate Area was carved out of the Private Area, but
UCS-2 implementations simply treat them as (single) characters with
special properties.  This was more or less backward compatible as all
corporate uses of the private area used the lower code points and
didn't conflict with the surrogates.  Finally (in 2000 or 2003) the
definition of UCS-2 in ISO 10646 was revised in a backward-
incompatible way to exclude surrogates entirely, ie, nowadays it is a
range-restricted version of UTF-16.

Footnotes: 
[1]  IIRC, strictly speaking this was done slightly later (1993 or
1994) in an official Amendment to ISO 10646; the Amendment was
incorporated into the standard in 2000.



More information about the Python-Dev mailing list