Hi there, I've been looking at Python eggs, easy install and setuptools with a lot of interest -- very impressive work. I've been thinking of packaging lxml with it, and, on the larger scale, look into packaging Zope 3 with it. Concerning lxml I run into a few questions however. lxml depends on large external C libraries (libxml2 and libxslt). a) Is there a way to require versions of C libraries to be available in the Python eggs dependencies? I can't seem to find a reference to this scenario, but perhaps I didn't look carefully enough. The goal here would be to give users trying to install lxml (or something that depends on lxml) useful feedback about what in their system they're missing (or have the wrong version of). b) Going further, it'd be nice for some scenarios to actually be able to include private versions of libxml2/libxslt in a Python eggs. This is especially interesting on Windows deployments (where you'd include a binary of these libraries). Has something like this considered? I saw references to Pyrex support, but in lxml's case, the Pyrex code depends on a large underlying library. c) It's also interesting though for deployment on linux. It'd be nice to be able to include the source versions of specific versions of libxml2 and libxslt with lxml and to be able to build/install them such that they are only used for lxml. This way the system libraries (which may be out of date or have otherwise a wrong version) would not be in play and wouldn't be affected. If something like this were arranged, it'd be much easier to make lxml a requirement of a large package like for instance Zope 3 (which is being considered). I realize that any or all of these might be out of scope for easy install -- in the Linux case, it might be deferred to a Linux package management system, for instance. Still, I imagine the case where a Python library has a dependency on a potentially large non-Python codebase could be fairly common, and it'd be nice if such libraries could be "first class" easy_install citizens so that other Python libraries can safely depend on them. What are people's thoughts were about supporting such scenarios? Regards, Martijn
At 11:57 PM 9/23/2005 +0200, Martijn Faassen wrote:
I've been looking at Python eggs, easy install and setuptools with a lot of interest -- very impressive work. I've been thinking of packaging lxml with it, and, on the larger scale, look into packaging Zope 3 with it.
Concerning lxml I run into a few questions however.
lxml depends on large external C libraries (libxml2 and libxslt).
a) Is there a way to require versions of C libraries to be available in the Python eggs dependencies? I can't seem to find a reference to this scenario, but perhaps I didn't look carefully enough. The goal here would be to give users trying to install lxml (or something that depends on lxml) useful feedback about what in their system they're missing (or have the wrong version of).
Your options here are the same as with any distutils package, which is to say you have to figure it out yourself. ;) You can add code to look for the libraries, embed your own source, etc.
b) Going further, it'd be nice for some scenarios to actually be able to include private versions of libxml2/libxslt in a Python eggs. This is especially interesting on Windows deployments (where you'd include a binary of these libraries). Has something like this considered? I saw references to Pyrex support, but in lxml's case, the Pyrex code depends on a large underlying library.
You can certainly do that; just list the appropriate .c files in your Extension. For PEAK on Python 2.3, I include an expat wrapping that adds the Python 2.4 pyexpat features this way, using something like: Extension("peak.util.pyexpat", [ "src/peak/util/pyexpat.c", "src/expat/xmlparse.c", "src/expat/xmltok.c", "src/expat/xmlrole.c", ], include_dirs=["src/expat"], define_macros=[('XML_STATIC',1),('HAVE_MEMMOVE',1)] # XXX ),
c) It's also interesting though for deployment on linux. It'd be nice to be able to include the source versions of specific versions of libxml2 and libxslt with lxml and to be able to build/install them such that they are only used for lxml. This way the system libraries (which may be out of date or have otherwise a wrong version) would not be in play and wouldn't be affected.
Yeah, just bake it in as shown above.
If something like this were arranged, it'd be much easier to make lxml a requirement of a large package like for instance Zope 3 (which is being considered).
I realize that any or all of these might be out of scope for easy install -- in the Linux case, it might be deferred to a Linux package management system, for instance. Still, I imagine the case where a Python library has a dependency on a potentially large non-Python codebase could be fairly common, and it'd be nice if such libraries could be "first class" easy_install citizens so that other Python libraries can safely depend on them. What are people's thoughts were about supporting such scenarios?
Not all libraries can be bundled by source, of course. Sometimes you really need to use whatever the "system version" is, for one reason or another. Database clients, for example, are something you really really want to use the local version for. I'm thinking that the distutils could really use some sort of library-finding capabilities for that stuff, assuming they don't already have some I just haven't found yet.
Phillip J. Eby wrote:
At 11:57 PM 9/23/2005 +0200, Martijn Faassen wrote:
I've been looking at Python eggs, easy install and setuptools with a lot of interest -- very impressive work. I've been thinking of packaging lxml with it, and, on the larger scale, look into packaging Zope 3 with it.
Concerning lxml I run into a few questions however.
lxml depends on large external C libraries (libxml2 and libxslt).
a) Is there a way to require versions of C libraries to be available in the Python eggs dependencies? I can't seem to find a reference to this scenario, but perhaps I didn't look carefully enough. The goal here would be to give users trying to install lxml (or something that depends on lxml) useful feedback about what in their system they're missing (or have the wrong version of).
Your options here are the same as with any distutils package, which is to say you have to figure it out yourself. ;) You can add code to look for the libraries, embed your own source, etc.
Right. I was hoping I didn't need to dive into the internals of distutils of course, but it's no surprise that I'd have to.
b) Going further, it'd be nice for some scenarios to actually be able to include private versions of libxml2/libxslt in a Python eggs. This is especially interesting on Windows deployments (where you'd include a binary of these libraries). Has something like this considered? I saw references to Pyrex support, but in lxml's case, the Pyrex code depends on a large underlying library.
You can certainly do that; just list the appropriate .c files in your Extension.
For PEAK on Python 2.3, I include an expat wrapping that adds the Python 2.4 pyexpat features this way, using something like:
Extension("peak.util.pyexpat", [ "src/peak/util/pyexpat.c", "src/expat/xmlparse.c", "src/expat/xmltok.c", "src/expat/xmlrole.c", ], include_dirs=["src/expat"], define_macros=[('XML_STATIC',1),('HAVE_MEMMOVE',1)] # XXX ),
libxml2 however is a huge C library with its own configure script (that it really uses, as it ports to a zillion platforms), so just listing C files to compile might very well not work, right? I guess for Windows, I'd have make distutils run the configure script, then extract the dlls it produces and stuff them in the egg somehow. Any direction you'd point me towards for this?
c) It's also interesting though for deployment on linux. It'd be nice to be able to include the source versions of specific versions of libxml2 and libxslt with lxml and to be able to build/install them such that they are only used for lxml. This way the system libraries (which may be out of date or have otherwise a wrong version) would not be in play and wouldn't be affected.
Yeah, just bake it in as shown above.
In this case, on Linux, I'd want to run the configure script when the egg is installed instead of when it's created, and stuff the .so files in the same place the egg is being installed to.
If something like this were arranged, it'd be much easier to make lxml a requirement of a large package like for instance Zope 3 (which is being considered).
I realize that any or all of these might be out of scope for easy install -- in the Linux case, it might be deferred to a Linux package management system, for instance. Still, I imagine the case where a Python library has a dependency on a potentially large non-Python codebase could be fairly common, and it'd be nice if such libraries could be "first class" easy_install citizens so that other Python libraries can safely depend on them. What are people's thoughts were about supporting such scenarios?
Not all libraries can be bundled by source, of course. Sometimes you really need to use whatever the "system version" is, for one reason or another. Database clients, for example, are something you really really want to use the local version for.
Right, there are competing use cases here. What I'd like is an easy install for lxml that just works for people, without them having to worry about the right lxml2 versions being installed, etc. On Windows this means binaries, and on Linux this likely means it'll just compile upon install. Some classes of people, like distributors and some sysadmins, care about using the platform version of libxml2, and I'd also want to create an egg that allows you to install against the platform libraries. Would this be possible to be the same egg or would a different egg be needed? If a different egg, how does this work with the dependency system? I.e. these two eggs would be alternatives of each other dependency-wise.
I'm thinking that the distutils could really use some sort of library-finding capabilities for that stuff, assuming they don't already have some I just haven't found yet.
Yes, that would indeed be useful. Thanks for the feedback! Regards, Martijn
Martijn Faassen wrote:
Phillip J. Eby wrote:
At 11:57 PM 9/23/2005 +0200, Martijn Faassen wrote:
I've been looking at Python eggs, easy install and setuptools with a lot of interest -- very impressive work. I've been thinking of packaging lxml with it, and, on the larger scale, look into packaging Zope 3 with it.
Concerning lxml I run into a few questions however.
lxml depends on large external C libraries (libxml2 and libxslt).
a) Is there a way to require versions of C libraries to be available in the Python eggs dependencies? I can't seem to find a reference to this scenario, but perhaps I didn't look carefully enough. The goal here would be to give users trying to install lxml (or something that depends on lxml) useful feedback about what in their system they're missing (or have the wrong version of).
Your options here are the same as with any distutils package, which is to say you have to figure it out yourself. ;) You can add code to look for the libraries, embed your own source, etc.
Right. I was hoping I didn't need to dive into the internals of distutils of course, but it's no surprise that I'd have to.
Hi Martijn, I'd suggest you take a look at mxSetup.py which is included in all recent egenix-mx-* packages. The egenix-mx-experimental package makes heavy use of its features to build and include external libs. For the latest version, see: http://www.egenix.com/files/python/egenix-mx-base-2.1.0-2005-05-01.zip -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 24 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
At 11:37 AM 9/24/2005 +0200, M.-A. Lemburg wrote:
I'd suggest you take a look at mxSetup.py which is included in all recent egenix-mx-* packages. The egenix-mx-experimental package makes heavy use of its features to build and include external libs.
For the latest version, see:
http://www.egenix.com/files/python/egenix-mx-base-2.1.0-2005-05-01.zip
Wow. That's pretty impressive. I'm definitely going to steal some ideas from that for setuptools, especially the flag to indicate that an extension is optional; I could really use that for certain of my packages. I had already been thinking of adding a facility to build_ext to look for needed includes and libraries, but you've certainly put more thought into it than I have so far.
At 11:13 AM 9/24/2005 +0200, Martijn Faassen wrote:
I guess for Windows, I'd have make distutils run the configure script, then extract the dlls it produces and stuff them in the egg somehow. Any direction you'd point me towards for this?
You'll need to have your setup script determine programmatically that it's running on Windows, and then add data files and 'eager_resources' to the project accordingly. In other words, to include shared libraries that aren't distutils-built extensions, you have to treat them as data files, but you must also list them in the 'eager_resources' setup keyword, as described here: http://peak.telecommunity.com/DevCenter/setuptools#automatic-resource-extrac... so that the shared libraries will be automatically extracted to disk before they're linked to by any C extensions. You can do this on Unix platforms, too, of course, but the filenames will naturally be different.
In this case, on Linux, I'd want to run the configure script when the egg is installed instead of when it's created, and stuff the .so files in the same place the egg is being installed to.
Eggs are not source distributions - they're prebuilt binaries for a particular platform. EasyInstall will find and use source distributions if there's no binary for the system, so from a user point of view it certainly happens when the egg is "installed", but that's only because the actual egg is being *built* locally. So, in any situation where you're not using the platform libraries, you'll need to follow the same embedding steps as for Windows - i.e., build the libraries and include them as data files. If you're linking with the platform libraries, you don't need to include them as data files. It's best, however, if you not think of this as install vs. create. The only installation steps eggs have are: * adding them to sys.path * creating local wrappers to run programs contained in them Everything else occurs when the egg is built, it's just that on Unix-y platforms you're more likely to be building the egg locally from source, rather than downloading a pre-built egg, if it contains C extensions. For Python-only projects, of course, eggs are cross-platform and ready-to-use binaries.
Not all libraries can be bundled by source, of course. Sometimes you really need to use whatever the "system version" is, for one reason or another. Database clients, for example, are something you really really want to use the local version for.
Right, there are competing use cases here. What I'd like is an easy install for lxml that just works for people, without them having to worry about the right lxml2 versions being installed, etc. On Windows this means binaries, and on Linux this likely means it'll just compile upon install.
Or more precisely, it means compiling on egg build, regardless of platform. It's just that for some platforms, you'll build the egg and distribute it instead of the end user building their own. Probably the simplest way to do this is to subclass setuptools 'build_ext' command and extend run() to 'configure' and 'make' the libraries before proceeding normally.
Some classes of people, like distributors and some sysadmins, care about using the platform version of libxml2, and I'd also want to create an egg that allows you to install against the platform libraries. Would this be possible to be the same egg or would a different egg be needed? If a different egg, how does this work with the dependency system? I.e. these two eggs would be alternatives of each other dependency-wise.
An egg is not a project, or vice versa. What you mean is that you want your project to be able to be built against different targets. The eggs are what gets built. If you've only used EasyInstall as a user, not a developer, this line is more blurry because EasyInstall downloads projects and builds the eggs for you if the author did not provide any eggs usable on your platform. But there is a distinction, and on the authoring side you should focus on the project and its ability to be built for different targets. Providing actual eggs is a convenience for your users, and does not enter into your development process. It's strictly a deployment/publishing step. Anyway, since the audience that wants to do this are more advanced users, and probably building your project individually rather than as part of a multi-project EasyInstall, I think it's reasonable to just have some value in a configuration file or environment variable or the standard distutils setup.cfg that allows skipping the configure+make. For that matter, just documenting how to patch the setup script to use the platform libraries might be sufficient for that audience in the case of this kind of embedding.
participants (3)
-
M.-A. Lemburg
-
Martijn Faassen
-
Phillip J. Eby