eggs and Python Unicode variants (UCS2, UCS4)

Hi there, In the lxml project (http://codespeak.net/lxml), we've just noticed the following problem with lxml eggs: you can easy_install an egg that won't work for your Python. This is because Python can be compiled with either 2 or 4 bytes unicode as its internal representation. Any egg that contains compiled C code that uses unicode such as lxml will run into trouble: if it's compiled with a 4 bytes unicode Python, it won't work on a 2 bytes unicode Python, and vice versa. This problem is fairly common in Linux. Many distributions such as Ubuntu and Fedora compile their python with 4 bytes unicode internal representation. If you compile a Python interpreter by hand it defaults to 2 bytes unicode, however. Hand-building a Python interpreter is done fairly commonly by Linux sysadmins for various reasons. It would therefore be very nice if it became possible to make eggs for the different unicode compilation options of Python. This configuration dimension is a real world issue for any binary Python module that does anything with unicode text.. In an earlier mail to this list: http://mail.python.org/pipermail/distutils-sig/2005-October/005222.html M.-A. Lemburg and Phillip Eby had the following discussion: [MAL]
Please make sure that your eggs catch all possible Python binary build dimensions:
* Python version * Python Unicode variant (UCS2, UCS4) * OS name * OS version * Platform architecture (e.g. 32-bit vs. 64-bit)
[PJE]
As far as I know, all of this except the Unicode variant is captured in distutils' get_platform(). And if it's not, it should be, since it affects any other kind of bdist mechanism.
I'm not sure whether this means this needs to be escalated from setuptools to the Python interpreter level itself. With this mail, I've done the job escalating this lxml problem to what appears to be the right place, though. :) Thanks, Martijn

Hey, Any feedback on this? Nobody cares that Python eggs compiled with a linux distribution version of Python don't run on hand-compiled versions of Python, and vice versa? I added [setuptools] to the topic in case that's the convention to get people concerned with those to pay attention. :) Regards, Martijn

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martijn Faassen wrote:
Hey,
Any feedback on this? Nobody cares that Python eggs compiled with a linux distribution version of Python don't run on hand-compiled versions of Python, and vice versa? I added [setuptools] to the topic in case that's the convention to get people concerned with those to pay attention. :)
I care. :) The "Python ABI" of an egg *is* part of its "signature", I think. I don't know for what use cases the UCS4 stuff was designed, but it is *never* what I want (doubling space requirements for all unicode strings is a recipe for an unhappy long-running process). I'd actually settle for having setuptools just refuse to install an incompatible binary egg, so that I would realize I needed to build it from source. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE00w2+gerLs4ltQ4RAoVEAKDQSTIotRiilcH4DLOQn98Fhh4e5QCfewpo 8lBo1DM1TWU913BcZN4gX40= =tAqK -----END PGP SIGNATURE-----

On Aug 4, 2006, at 11:36 AM, Martijn Faassen wrote:
Hey,
Any feedback on this? Nobody cares that Python eggs compiled with a linux distribution version of Python don't run on hand-compiled versions of Python, and vice versa? I added [setuptools] to the topic in case that's the convention to get people concerned with those to pay attention. :)
My guess is that nobody cares enough to provide a patch ;-) The size of unicode characters is IMO part of the ABI "description", just like the python version and should therefore be part of the egg name. We should end up with something like 'lxml-1.0.1-py2.4-ucs2- macosx-10.4-fat.egg'. A minor problem is that the right place to fix this is in distutils, not setuptools. That way other bdist_* targets would also pick up the right ABI description. Ronald

At 09:05 PM 8/4/2006 +0200, Ronald Oussoren wrote:
On Aug 4, 2006, at 11:36 AM, Martijn Faassen wrote:
Hey,
Any feedback on this? Nobody cares that Python eggs compiled with a linux distribution version of Python don't run on hand-compiled versions of Python, and vice versa? I added [setuptools] to the topic in case that's the convention to get people concerned with those to pay attention. :)
My guess is that nobody cares enough to provide a patch ;-)
The size of unicode characters is IMO part of the ABI "description", just like the python version and should therefore be part of the egg name. We should end up with something like 'lxml-1.0.1-py2.4-ucs2- macosx-10.4-fat.egg'.
A minor problem is that the right place to fix this is in distutils, not setuptools. That way other bdist_* targets would also pick up the right ABI description.
Yes, it should be fixed in the distutils. Of course, for Python versions <2.6, it will actually have to be fixed by setuptools. It would be nice if someone could provide patches for both distutils and setuptools. To implement the setuptools patch, you will need to modify the various "platform utilities" in pkg_resources: http://peak.telecommunity.com/DevCenter/PkgResources#platform-utilities bdist_egg and all of the pkg_resources internals rely on these functions. Note that for this patch to be backward-compatible, it *must* change compatible_platforms to only compare unicode widths if they are present in *both* the 'required' and the 'provided' strings. If either one is missing a unicode width indicator, the unicode width must be ignored. This is a non-negotiable requirement, since otherwise it will be impossible for someone to use packages they've already built locally once they upgrade. The patch will also have to go into the new 0.7 alpha line (rather than the 0.6c line), since it's a new feature.

This problem continues to bite: http://allmydata.org/trac/tahoe/ticket/704 Has any progress been made? Here's the original thread: http://markmail.org/message/bla5vrwlv3kn3n7e Thanks!
participants (5)
-
David Abrahams
-
Martijn Faassen
-
Phillip J. Eby
-
Ronald Oussoren
-
Tres Seaver