[Python-Dev] New Py_UNICODE doc
"Martin v. Löwis"
martin at v.loewis.de
Sat May 7 02:18:47 CEST 2005
Nicholas Bastin wrote:
> What I mean is pretty clear. UCS-2 does *NOT* support surrogate pairs.
> If it did, it would be called UTF-16. If Python really supported
> UCS-2, then surrogate pairs from UTF-16 inputs would either get turned
> into two garbage characters, or the "I couldn't transcode this" UCS-2
> code point (I don't remember which on that is off the top of my head).
OTOH, if Python really supported UTF-16, then unichr(0x10000) would
work, and len(u"\U00010000") would be 1.
It is primarily just the UTF-8 codec which supports UTF-16.
Regards,
Martin
More information about the Python-Dev
mailing list