[Python-Dev] New Py_UNICODE doc

M.-A. Lemburg mal at egenix.com
Sat May 7 23:41:18 CEST 2005

Nicholas Bastin wrote:
> On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote:
>>However, I don't understand all the excitement
>>about Py_UNICODE: if you don't like the way this Python
>>typedef works, you are free to interface to Python using
>>any of the supported encodings using PyUnicode_Encode()
>>and PyUnicode_Decode(). I'm sure you'll find one that
>>fits your needs and if not, you can even write your
>>own codec and register it with Python, e.g. UTF-32
>>which we currently don't support ;-)
> My concerns about Py_UNICODE are completely separate from my 
> frustration that the documentation is wrong about this type.  It is 
> much more important that the documentation be correct, first, and then 
> we can discuss the reasons why it can be one of two values, rather than 
> just a uniform value across all python implementations.  This makes 
> distributing binary extension modules hard.  It has become clear to me 
> that no one on this list gives a *%&^ about people attempting to 
> distribute binary extension modules, or they would have cared about 
> this problem, so I'll just drop that point.

Actually, many of us know about the problem of having to
ship UCS2 and UCS4 builds of binary extensions and the
troubles this causes with users.

It just adds one more dimension to the number of builds
you have to make - one for the Python version, another
for the platform and in the case of Linux another one for
the Unicode width. Nowadays most Linux distros ship UCS4
builds (after RedHat started this quest), so things start
to normalize again.

> However, somehow, what keeps getting lost in the mix is that 
> --enable-unicode=ucs2 is a lie, and we should change what this 
> configure option says.  Martin seems to disagree with me, for reasons 
> that I don't understand.  I would be fine with calling the option 
> utf16, or just 2 and 4, but not ucs2, as that means things that Python 
> doesn't intend it to mean.

It's not a lie: the Unicode implementation does work with
UCS2 code points (surrogate values are Unicode code points as
well - they happen to live in a special zone of the BMP).

Only the codecs add support for surrogates in a way that
allows round-trip safety regardless of whether you used UCS2
or UCS4 as compile time option.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 07 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

More information about the Python-Dev mailing list