[Python-Dev] New Py_UNICODE doc

Shane Hathaway shane at hathawaymix.org
Thu May 5 00:20:45 CEST 2005


Martin v. Löwis wrote:
> Nicholas Bastin wrote:
> 
>>"This type represents the storage type which is used by Python 
>>internally as the basis for holding Unicode ordinals.  Extension module 
>>developers should make no assumptions about the size of this type on 
>>any given platform."
> 
> 
> But people want to know "Is Python's Unicode 16-bit or 32-bit?"
> So the documentation should explicitly say "it depends".

On a related note, it would be help if the documentation provided a
little more background on unicode encoding.  Specifically, that UCS-2 is
not the same as UTF-16, even though they're both two bytes wide and most
of the characters are the same.  UTF-16 can encode 4 byte characters,
while UCS-2 can't.  A Py_UNICODE is either UCS-2 or UCS-4.  It took me
quite some time to figure that out so I could produce a patch [1]_ for
PyXPCOM  that fixes its unicode support.

.. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=281156

Shane


More information about the Python-Dev mailing list