[Python-Dev] New Py_UNICODE doc

Nicholas Bastin nbastin at opnet.com
Thu May 5 21:58:17 CEST 2005


On May 4, 2005, at 6:20 PM, Shane Hathaway wrote:

> Martin v. Löwis wrote:
>> Nicholas Bastin wrote:
>>
>>> "This type represents the storage type which is used by Python
>>> internally as the basis for holding Unicode ordinals.  Extension 
>>> module
>>> developers should make no assumptions about the size of this type on
>>> any given platform."
>>
>>
>> But people want to know "Is Python's Unicode 16-bit or 32-bit?"
>> So the documentation should explicitly say "it depends".
>
> On a related note, it would be help if the documentation provided a
> little more background on unicode encoding.  Specifically, that UCS-2 
> is
> not the same as UTF-16, even though they're both two bytes wide and 
> most
> of the characters are the same.  UTF-16 can encode 4 byte characters,
> while UCS-2 can't.  A Py_UNICODE is either UCS-2 or UCS-4.  It took me

I'm not sure the Python documentation is the place to teach someone 
about unicode.  The ISO 10646 pretty clearly defines UCS-2 as only 
containing characters in the BMP (plane zero).  On the other hand, I 
don't know why python lets you choose UCS-2 anyhow, since it's almost 
always not what you want.

--
Nick



More information about the Python-Dev mailing list