[Python-Dev] please consider changing --enable-unicode default to ucs4

M.-A. Lemburg mal at egenix.com
Mon Sep 28 10:25:45 CEST 2009


Zooko O'Whielacronx wrote:
> Folks:
> 
> I'm sorry, I think I didn't make my concern clear.  My users, and lots
> of other users, are having a problem with incompatibility between
> Python binary extension modules.  One way to improve the situation
> would be if the Python devs would use their "bully pulpit" -- their
> unique position as a source respected by all Linux distributions --
> and say "We recommend that Linux distributions use UCS4 for
> compatibility with one another".  This would not abrogate anyone's
> ability to choose their preferred setting nor, as far as I can tell,
> would it interfere with the ongoing development of Python.

-1

Please note that we did not choose to ship Python as UCS4 binary
on Linux - the Linux distributions did.

The Python default is UCS2 for a good reason: it's a good trade-off
between memory consumption, functionality and performance.

As already mentioned, I also don't understand how the changing
the Python default on Linux would help your users in any way -
if you let distutils compile your extensions, it's automatically
going to use the right Unicode setting for you (as well as your
users).

Unfortunately, this automatic support doesn't help you when
shipping e.g. setuptools eggs, but this is a tool problem,
not one of Python: setuptools completely ignores the fact
that there are two ways to build Python.

I'd suggest you ask the tool maintainers to adjust their tools
to support the Python Unicode option.

> Here are the details:
> 
> I'm the maintainer of several Python packages.  I work hard to make it
> easy for users, even users who don't know anything about Python, to
> use my software.  There have been many pain points in this process and
> I've spent a lot of time on it for about three years now working on
> packaging, including the tools such as setuptools and distutils and
> the new "distribute" tool.  Python packaging has been improving during
> these years -- things are looking up.
> 
> One of the remaining pain points is that I can distribute binaries of
> my Python extension modules for Windows or Mac, but if I distribute a
> binary Python extension module on Linux, then if the user has a
> different UCS2/UCS4 setting then they won't be able to use the
> extension module.  The current de facto standard for Linux is UCS4 --
> it is used by Debian, Ubuntu, Fedora, RHEL, OpenSuSE, etc. etc..  The
> vast majority of Linux users in practice have UCS4, and most binary
> Python modules are compiled for UCS4.
> 
> That means that a few folks will get left out.  Those folks, from my
> experience, are people who built their python executable themselves
> without specifying an override for the default, and the smaller Linux
> distributions who insist on doing whatever upstream Python devs
> recommend instead of doing whatever the other Linux distros are doing.
>  One of the data points that I reported was a Python interpreter that
> was built locally on an Ubuntu server.  Since the person building it
> didn't know to override the default setting of --enable-unicode, he
> ended up with a Python interpreter built for UCS2, even though all the
> Python extension modules shipped by Ubuntu were built with UCS4.

People building their own Python version will usually also build
their own extensions, so I don't really believe that the above
scenario is very common.

Also note that Python will complain loudly when you try to load
a UCS2 extension in a UCS4 build and vice-versa. We've made sure
that any extension using the Python Unicode C API has to be built
for the same UCS version of Python. This is done by using different
names for the C APIs at the C level.

> These are not isolated incidents.  The following google searches
> suggest that a number of people spend time trying to figure out why
> Python extension modules fail on their linux systems:
> 
> http://www.google.com/search?q=PyUnicodeUCS4_FromUnicode+undefined+symbol
> http://www.google.com/search?q=+PyUnicodeUCS2_FromUnicode+undefined+symbol
> http://www.google.com/search?q=_PyUnicodeUCS2_AsDefaultEncodedString+undefined+symbol

Perhaps we should add a FAQ entry for these linker errors
(which are caused by the mentioned C API changes to prevent
mixing UCS version) ?!

Here's a quick way to determine you Python Unicode build type:

python -c "import sys;print((sys.maxunicode<66000)and'UCS2'or'UCS4')"

Perhaps we should include this info as well as an 32/64-bit indicator
and the processor type in the Python startup line:

# python
Python 2.6 (r26:66714, Feb  3 2009, 20:49:49, UCS4, 64-bit, x86_64)
[GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2
Type "help", "copyright", "credits" or "license" for more information.

This would help users find the right binaries to install as
extension.

> Another data point is the Mandriva Linux distribution.  It is probably
> much smaller than Debian, Ubuntu, or RedHat, but it is still one of
> the major, well-known distributions.  I requested of the Python
> maintainer for Mandriva, Michael Scherer, that they switch from UCS2
> to UCS4 in order to reduce compatibility problems like these.  His
> answer as I understood it was that it is best to follow the
> recommendations of the upstream Python devs by using the default
> setting instead of choosing a setting for himself.

Which is IMHO what all Linux distributions should have done.

Distributions should really not be put in charge of upstream
coding design decisions.

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 28 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list