[Python-Dev] New Py_UNICODE doc
M.-A. Lemburg
mal at egenix.com
Fri May 6 09:17:26 CEST 2005
Nicholas Bastin wrote:
> On May 4, 2005, at 6:20 PM, Shane Hathaway wrote:
>
>>>Nicholas Bastin wrote:
>>>
>>>
>>>>"This type represents the storage type which is used by Python
>>>>internally as the basis for holding Unicode ordinals. Extension
>>>>module
>>>>developers should make no assumptions about the size of this type on
>>>>any given platform."
>>>
>>>
>>>But people want to know "Is Python's Unicode 16-bit or 32-bit?"
>>>So the documentation should explicitly say "it depends".
>>
>>On a related note, it would be help if the documentation provided a
>>little more background on unicode encoding. Specifically, that UCS-2
>>is
>>not the same as UTF-16, even though they're both two bytes wide and
>>most
>>of the characters are the same. UTF-16 can encode 4 byte characters,
>>while UCS-2 can't. A Py_UNICODE is either UCS-2 or UCS-4. It took me
>
> I'm not sure the Python documentation is the place to teach someone
> about unicode. The ISO 10646 pretty clearly defines UCS-2 as only
> containing characters in the BMP (plane zero). On the other hand, I
> don't know why python lets you choose UCS-2 anyhow, since it's almost
> always not what you want.
You've got that wrong: Python let's you choose UCS-4 -
UCS-2 is the default.
Note that Python's Unicode codecs UTF-8 and UTF-16
are surrogate aware and thus support non-BMP code points
regardless of the build type: A UCS2-build of Python will
store a non-BMP code point as UTF-16 surrogate pair in the
Py_UNICODE buffer while a UCS4 build will store it as a
single value. Decoding is surrogate aware too, so a UTF-16
surrogate pair in a UCS2 build will get treated as single
Unicode code point.
Ideally, the Python programmer should not really need to
know all this and I think we've achieved that up to certain
point (Unicode can be complicated - there's nothing to hide there).
However, the C progammer using the Python C API to interface
to some other Unicode implementation will need to know these
details.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, May 06 2005)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list