[Python-Dev] please consider changing --enable-unicode default to ucs4

M.-A. Lemburg mal at egenix.com
Wed Oct 7 20:05:27 CEST 2009

Zooko O'Whielacronx wrote:
> Dear MAL and python-dev:
> I failed to explain the problem that users are having.  I will try
> again, and this time I will omit my ideas about how to improve things
> and just focus on describing the problem.
> Some users are having trouble using Python packages containing binary
> extensions on Linux.  I want to provide such binary Python packages
> for Linux for the pycryptopp project
> (http://allmydata.org/trac/pycryptopp ) and the zfec project
> (http://allmydata.org/trac/zfec ).  I also want to make it possible
> for users to install the Tahoe-LAFS project (http://allmydata.org )
> without having a compiler or Python header files.  (You'd be surprised
> at how often Tahoe-LAFS users try to do this on Linux.  Linux is no
> longer only for people who have the knowledge and patience to compile
> software themselves.)  Tahoe-LAFS also depends on many packages that
> are maintained by other people and are not packaged or distributed by
> me -- pyOpenSSL, simplejson, etc..
> There have been several hurdles in the way that we've overcome, and no
> doubt there will be more, but the current hurdle is that there are two
> "formats" for Python extension modules that are used on Linux -- UCS2
> and UCS4.  If a user gets a Python package containing a compiled
> extension module which was built for the wrong UCS2/4 setting, he will
> get mysterious (to him) "undefined symbol" errors at import time.

Zooko, I really fail to see the reasoning here:

Why would people who know how to build their own Python interpreter
on Linux and expect it to work like the distribution-provided one,
have a problem looking up the distribution-used configuration
settings ?

This is like compiling your own Linux kernel without using
the same configuration as the distribution kernel and still
expecting the distribution kernel modules to load without

Note that this has nothing to do with compiling your own
Python extensions. Python's distutils will automatically
use the right settings for compiling those, based on the
configuration of the Python interpreter used for running
the compilation - which will usually be the distribution

Your argument doesn't really live up to the consequences
of switching to UCS4.

Just as data-point: eGenix has been shipping binaries for
Python packages for several years and while we do occasionally
get reports about UCS2/UCS4 mismatches, those are really
in the minority.

I'd also question using the UCS4 default only on Linux.

If we do go for a change, we should use sizeof(wchar_t)
as basis for the new default - on all platforms that
provide a wchar_t type.

However, before we can make such a decision, we need more
data about the consequences. That is:

 * memory footprint changes

 * performance changes

For both Python 2.x and 3.x. After all, UCS4 uses twice
as much memory for all Unicode objects as UCS2.

Since Python 3.x uses Unicode for all strings, I'd expect
such a change to have more impact there.

We'd also need to look into possible problems with different
compilers using different wchar_t sizes on the same platform
(I doubt that there are any).

On Windows, the default is fixed since Windows uses
UTF-16 for everything Unicode, so UCS2 will for a long
time be the only option on that platform.

That said, it'll take a while for distributions to
upgrade, so you're always better off getting the tools
you're using to deal with the problem for you and your
users, since those are easier to upgrade.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Oct 07 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-Dev mailing list