[Distutils] eggs and Python Unicode variants (UCS2, UCS4)

Martijn Faassen faassen at infrae.com
Mon Jul 31 20:15:57 CEST 2006

Hi there,

In the lxml project (http://codespeak.net/lxml), we've just noticed the 
following problem with lxml eggs: you can easy_install an egg that won't 
work for your Python.

This is because Python can be compiled with either 2 or 4 bytes unicode 
as its internal representation. Any egg that contains compiled C code 
that uses unicode such as lxml will run into trouble: if it's compiled 
with a 4 bytes unicode Python, it won't work on a 2 bytes unicode 
Python, and vice versa.

This problem is fairly common in Linux. Many distributions such as 
Ubuntu and Fedora compile their python with 4 bytes unicode internal 
representation. If you compile a Python interpreter by hand it defaults 
to 2 bytes unicode, however. Hand-building a Python interpreter is done 
fairly commonly by Linux sysadmins for various reasons.

It would therefore be very nice if it became possible to make eggs for 
the different unicode compilation options of Python. This configuration 
dimension is a real world issue for any binary Python module that does 
anything with unicode text..

In an earlier mail to this list:


M.-A. Lemburg and Phillip Eby had the following discussion:

 >>Please make sure that your eggs catch all possible
 >>Python binary build dimensions:
 >>* Python version
 >>* Python Unicode variant (UCS2, UCS4)
 >>* OS name
 >>* OS version
 >>* Platform architecture (e.g. 32-bit vs. 64-bit)

 >As far as I know, all of this except the Unicode variant is captured in
 >distutils' get_platform().  And if it's not, it should be, since it
 >affects any other kind of bdist mechanism.

I'm not sure whether this means this needs to be escalated from 
setuptools to the Python interpreter level itself. With this mail, I've 
done the job escalating this lxml problem to what appears to be the 
right place, though. :)



More information about the Distutils-SIG mailing list