[Python-Dev] New Py_UNICODE doc
Shane Hathaway
shane at hathawaymix.org
Fri May 6 01:55:23 CEST 2005
Nicholas Bastin wrote:
>
> On May 4, 2005, at 6:20 PM, Shane Hathaway wrote:
>> On a related note, it would be help if the documentation provided a
>> little more background on unicode encoding. Specifically, that UCS-2 is
>> not the same as UTF-16, even though they're both two bytes wide and most
>> of the characters are the same. UTF-16 can encode 4 byte characters,
>> while UCS-2 can't. A Py_UNICODE is either UCS-2 or UCS-4. It took me
>
>
> I'm not sure the Python documentation is the place to teach someone
> about unicode. The ISO 10646 pretty clearly defines UCS-2 as only
> containing characters in the BMP (plane zero). On the other hand, I
> don't know why python lets you choose UCS-2 anyhow, since it's almost
> always not what you want.
Then something in the Python docs ought to say why UCS-2 is not what you
want. I still don't know; I've heard differing opinions on the subject.
Some say you'll never need more than what UCS-2 provides. Is that
incorrect?
More generally, how should a non-unicode-expert writing Python extension
code find out the minimum they need to know about unicode to use the
Python unicode API? The API reference [1] ought to at least have a list
of background links. I had to hunt everywhere.
.. [1] http://docs.python.org/api/unicodeObjects.html
Shane
More information about the Python-Dev
mailing list