[Distutils] eggs and Python Unicode variants (UCS2, UCS4)
faassen at infrae.com
Mon Jul 31 20:15:57 CEST 2006
In the lxml project (http://codespeak.net/lxml), we've just noticed the
following problem with lxml eggs: you can easy_install an egg that won't
work for your Python.
This is because Python can be compiled with either 2 or 4 bytes unicode
as its internal representation. Any egg that contains compiled C code
that uses unicode such as lxml will run into trouble: if it's compiled
with a 4 bytes unicode Python, it won't work on a 2 bytes unicode
Python, and vice versa.
This problem is fairly common in Linux. Many distributions such as
Ubuntu and Fedora compile their python with 4 bytes unicode internal
representation. If you compile a Python interpreter by hand it defaults
to 2 bytes unicode, however. Hand-building a Python interpreter is done
fairly commonly by Linux sysadmins for various reasons.
It would therefore be very nice if it became possible to make eggs for
the different unicode compilation options of Python. This configuration
dimension is a real world issue for any binary Python module that does
anything with unicode text..
In an earlier mail to this list:
M.-A. Lemburg and Phillip Eby had the following discussion:
>>Please make sure that your eggs catch all possible
>>Python binary build dimensions:
>>* Python version
>>* Python Unicode variant (UCS2, UCS4)
>>* OS name
>>* OS version
>>* Platform architecture (e.g. 32-bit vs. 64-bit)
>As far as I know, all of this except the Unicode variant is captured in
>distutils' get_platform(). And if it's not, it should be, since it
>affects any other kind of bdist mechanism.
I'm not sure whether this means this needs to be escalated from
setuptools to the Python interpreter level itself. With this mail, I've
done the job escalating this lxml problem to what appears to be the
right place, though. :)
More information about the Distutils-SIG