[Python-Dev] New Py_UNICODE doc

M.-A. Lemburg mal at egenix.com
Fri May 6 09:17:26 CEST 2005


Nicholas Bastin wrote:
> On May 4, 2005, at 6:20 PM, Shane Hathaway wrote:
> 
>>>Nicholas Bastin wrote:
>>>
>>>
>>>>"This type represents the storage type which is used by Python
>>>>internally as the basis for holding Unicode ordinals.  Extension 
>>>>module
>>>>developers should make no assumptions about the size of this type on
>>>>any given platform."
>>>
>>>
>>>But people want to know "Is Python's Unicode 16-bit or 32-bit?"
>>>So the documentation should explicitly say "it depends".
>>
>>On a related note, it would be help if the documentation provided a
>>little more background on unicode encoding.  Specifically, that UCS-2 
>>is
>>not the same as UTF-16, even though they're both two bytes wide and 
>>most
>>of the characters are the same.  UTF-16 can encode 4 byte characters,
>>while UCS-2 can't.  A Py_UNICODE is either UCS-2 or UCS-4.  It took me
> 
> I'm not sure the Python documentation is the place to teach someone 
> about unicode.  The ISO 10646 pretty clearly defines UCS-2 as only 
> containing characters in the BMP (plane zero).  On the other hand, I 
> don't know why python lets you choose UCS-2 anyhow, since it's almost 
> always not what you want.

You've got that wrong: Python let's you choose UCS-4 -
UCS-2 is the default.

Note that Python's Unicode codecs UTF-8 and UTF-16
are surrogate aware and thus support non-BMP code points
regardless of the build type: A UCS2-build of Python will
store a non-BMP code point as UTF-16 surrogate pair in the
Py_UNICODE buffer while a UCS4 build will store it as a
single value. Decoding is surrogate aware too, so a UTF-16
surrogate pair in a UCS2 build will get treated as single
Unicode code point.

Ideally, the Python programmer should not really need to
know all this and I think we've achieved that up to certain
point (Unicode can be complicated - there's nothing to hide there).
However, the C progammer using the Python C API to interface
to some other Unicode implementation will need to know these
details.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 06 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list