[Python-Dev] New Py_UNICODE doc

"Martin v. Löwis" martin at v.loewis.de
Sat May 7 01:43:15 CEST 2005


Nicholas Bastin wrote:
> If this is the case, then we're clearly misleading users.  If the
> configure script says UCS-2, then as a user I would assume that
> surrogate pairs would *not* be encoded, because I chose UCS-2, and it
> doesn't support that.

What do you mean by that? That the interpreter crashes if you try
to store a low surrogate into a Py_UNICODE?

> I would assume that any UTF-16 string I would
> read would be transcoded into the internal type (UCS-2), and information
> would be lost.  If this is not the case, then what does the configure
> option mean?

It tells you whether you have the two-octet form of the Universal
Character Set, or the four-octet form.

Regards,
Martin


More information about the Python-Dev mailing list