[Python-Dev] New Py_UNICODE doc

"Martin v. Löwis" martin at v.loewis.de
Sat May 7 02:15:27 CEST 2005


Shane Hathaway wrote:
> Ok.  Thanks for helping me understand where Python is WRT unicode.  I
> can work around the issues (or maybe try to help solve them) now that I
> know the current state of affairs.  If Python correctly handled UTF-16
> strings internally, we wouldn't need the UCS-4 configuration switch,
> would we?

Define correctly. Python, in ucs2 mode, will allow to address individual
surrogate codes, e.g. in indexing. So you get

>>> u"\U00012345"[0]
u'\ud808'

This will never work "correctly", and never should, because an efficient
implementation isn't possible. If you want "safe" indexing and slicing,
you need ucs4.

Regards,
Martin


More information about the Python-Dev mailing list