[Python-Dev] New Py_UNICODE doc

James Y Knight foom at fuhm.net
Fri May 6 21:42:16 CEST 2005


On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote:
> If this is the case, then we're clearly misleading users.  If the
> configure script says UCS-2, then as a user I would assume that
> surrogate pairs would *not* be encoded, because I chose UCS-2, and it
> doesn't support that.  I would assume that any UTF-16 string I would
> read would be transcoded into the internal type (UCS-2), and
> information would be lost.  If this is not the case, then what does the
> configure option mean?

It means all the string operations treat strings as if they were UCS-2, 
but that in actuality, they are UTF-16. Same as the case in the windows 
APIs and Java. That is, all string operations are essentially broken, 
because they're operating on encoded bytes, not characters, but claim 
to be operating on characters.

James



More information about the Python-Dev mailing list