[Python-Dev] New Py_UNICODE doc
Nicholas Bastin
nbastin at opnet.com
Thu May 5 21:58:17 CEST 2005
On May 4, 2005, at 6:20 PM, Shane Hathaway wrote:
> Martin v. Löwis wrote:
>> Nicholas Bastin wrote:
>>
>>> "This type represents the storage type which is used by Python
>>> internally as the basis for holding Unicode ordinals. Extension
>>> module
>>> developers should make no assumptions about the size of this type on
>>> any given platform."
>>
>>
>> But people want to know "Is Python's Unicode 16-bit or 32-bit?"
>> So the documentation should explicitly say "it depends".
>
> On a related note, it would be help if the documentation provided a
> little more background on unicode encoding. Specifically, that UCS-2
> is
> not the same as UTF-16, even though they're both two bytes wide and
> most
> of the characters are the same. UTF-16 can encode 4 byte characters,
> while UCS-2 can't. A Py_UNICODE is either UCS-2 or UCS-4. It took me
I'm not sure the Python documentation is the place to teach someone
about unicode. The ISO 10646 pretty clearly defines UCS-2 as only
containing characters in the BMP (plane zero). On the other hand, I
don't know why python lets you choose UCS-2 anyhow, since it's almost
always not what you want.
--
Nick
More information about the Python-Dev
mailing list