setuptools in a cross-compilation packaging environment
Hi all, I am the package maintainer of python and a couple of python packages for the nslu2 optware platform, see http://www.nslu2-linux.org and http://ipkg.nslu2-linux.org/feeds/unslung/cross/ There were some distutils features I really appreciate: 1) setup.py install --prefix This allows us to install into a separate staging area. 2) setup.cfg [build_scripts] executable option 3) setup.cfg [build_ext] include-dirs, library-dirs and rpath options The new setuptools is all nice and easy for end user, but as a package maintainer, I'd like to have the option of building a binary package without all the dependencies. Is it possible? How about the above distutils features, are there any equivalent? Thanks in advance for any advice, -Brian Zhou
At 07:42 PM 10/4/2005 -0700, Brian Zhou wrote:
Hi all,
I am the package maintainer of python and a couple of python packages for the nslu2 optware platform, see http://www.nslu2-linux.org and http://ipkg.nslu2-linux.org/feeds/unslung/cross/
There were some distutils features I really appreciate:
1) setup.py install --prefix This allows us to install into a separate staging area.
2) setup.cfg [build_scripts] executable option
3) setup.cfg [build_ext] include-dirs, library-dirs and rpath options
All of these options should work just fine; if they don't, it's almost certainly a setuptools bug and you should report it here.
The new setuptools is all nice and easy for end user, but as a package maintainer, I'd like to have the option of building a binary package without all the dependencies.
In the long run, this should be done by packaging the result of bdist_egg, and by default doing bdist_rpm will do this now. In the short term, unless you're switching to an all-egg distribution, you'll probably want to use legacy/unmanaged mode.
Is it possible? How about the above distutils features, are there any equivalent?
All of the old 'install' options are available, if you use the appropriate command line flags. Try "setup.py install --help" and look for the rather long option name about doing legacy "unmanaged" installation. With that option, a "classic" installation will be done, without an egg being built. However, even if you don't specify that option, all of the options you listed above should work anyway. Please note, by the way, that some packages simply *cannot* work using the "unmanaged" mode, and never will. Such packages *must* be installed as eggs, period. Among these are setuptools itself, and any packages that are plugins for Trac or Python Paste, or any other extensible system using setuptools to manage plugins. What's more, packages that explicitly use pkg_resources to request their dependencies will not recognize unmanaged packages as fulfilling the dependency, which means that over time there will be increasing demand for packages to be installed as eggs to start with. The simplest way to deal with this is to install vendor-packaged eggs to wherever the distribution normally installs packaged Python packages, and install a corresponding '.pth' file that points to the installed egg. This will ensure that the package is available and can be imported as if it were installed the old way, so it doesn't break any non-setuptools-based packages. It also does not require pkg_resources or setuptools to be installed unless a given package actually uses them. (If the package contains C extensions, the .egg can be expanded as a directory, so that it isn't necessary to extract the .so files at runtime.)
[Some comments on your strategy...] Phillip J. Eby wrote:
The new setuptools is all nice and easy for end user, but as a package maintainer, I'd like to have the option of building a binary package without all the dependencies.
In the long run, this should be done by packaging the result of bdist_egg, and by default doing bdist_rpm will do this now. In the short term, unless you're switching to an all-egg distribution, you'll probably want to use legacy/unmanaged mode.
I think you are missing his point here: As package maintainer you *have* to be able to build a distribution package without all the dependency checks being applied - how else would you be able to bootstrap the package in case you have circular dependencies ?
Is it possible? How about the above distutils features, are there any equivalent?
All of the old 'install' options are available, if you use the appropriate command line flags. Try "setup.py install --help" and look for the rather long option name about doing legacy "unmanaged" installation. With that option, a "classic" installation will be done, without an egg being built. However, even if you don't specify that option, all of the options you listed above should work anyway.
Please note, by the way, that some packages simply *cannot* work using the "unmanaged" mode, and never will. Such packages *must* be installed as eggs, period. Among these are setuptools itself, and any packages that are plugins for Trac or Python Paste, or any other extensible system using setuptools to manage plugins.
What's more, packages that explicitly use pkg_resources to request their dependencies will not recognize unmanaged packages as fulfilling the dependency, which means that over time there will be increasing demand for packages to be installed as eggs to start with.
I don't think that eggs are the solution to everything, so you should at least extend the dependency checking code to have it detect already installed packages (by trying import and looking at __version__ strings) or having an option to tell the system: "this dependency is satisfied, trust me". BTW, I haven't looked at your bdist_egg, but since we're currently fighting through binaries releases for our eGenix packages, I thought I'd drop in a note: Please make sure that your eggs catch all possible Python binary build dimensions: * Python version * Python Unicode variant (UCS2, UCS4) * OS name * OS version * Platform architecture (e.g. 32-bit vs. 64-bit) and please also make this scheme extendable, so that it is easy to add more dimensions should they become necessary in the future. To make things easier for the user, the install system should be capable of detecting all these dimensions and use appropriate defaults when looking for an egg. BTW, have you had a look at the ActiveState ppm system for add-on packages ? It looks a lot like your egg system.
The simplest way to deal with this is to install vendor-packaged eggs to wherever the distribution normally installs packaged Python packages, and install a corresponding '.pth' file that points to the installed egg. This will ensure that the package is available and can be imported as if it were installed the old way, so it doesn't break any non-setuptools-based packages. It also does not require pkg_resources or setuptools to be installed unless a given package actually uses them. (If the package contains C extensions, the .egg can be expanded as a directory, so that it isn't necessary to extract the .so files at runtime.)
Please reconsider your use of .pth files - these cause the Python interpreter startup time to increase significantly. If you just have one of those files pointing to your managed installation path used for eggs, that should be fine (although adding that path to PYTHONPATH still beats having a .pth to parse everytime the interpreter fires up). If you however install a .pth file for every egg, you'll soon end up with an unreasonable startup time which slows down your whole Python installation - including applications that don't use setuptools or any of the eggs. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 05 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
At 10:27 AM 10/5/2005 +0200, M.-A. Lemburg wrote:
[Some comments on your strategy...]
Phillip J. Eby wrote:
The new setuptools is all nice and easy for end user, but as a package maintainer, I'd like to have the option of building a binary package without all the dependencies.
In the long run, this should be done by packaging the result of bdist_egg, and by default doing bdist_rpm will do this now. In the short term, unless you're switching to an all-egg distribution, you'll probably want to use legacy/unmanaged mode.
I think you are missing his point here:
As package maintainer you *have* to be able to build a distribution package without all the dependency checks being applied - how else would you be able to bootstrap the package in case you have circular dependencies ?
In legacy/unmanaged mode, setuptools' "install" command behaves the way the standard distutils "install" does today, without creating an egg or searching for dependencies.
I don't think that eggs are the solution to everything, so you should at least extend the dependency checking code to have it detect already installed packages (by trying import and looking at __version__ strings) or having an option to tell the system: "this dependency is satisfied, trust me".
There are plans to have a feature like that, and in fact setuptools already has code to hunt down __version__ strings and the like, without even needing to import the packages. It isn't integrated with the rest of the system yet, though. One reason for that is that early feedback suggests that package developers and users would rather have the assurance of having the exact version required by something, as long as the installation process doesn't impose any additional burden on them. Local detection hacks have been primarily requested by packagers, who (quite reasonably) do not want to have to repackage everything as eggs. There is a simple trick that packagers can use to make their legacy packages work as eggs: build .egg-info directories for them in the sys.path directory where the package resides, so that the necessary metadata is present. This does not require the use of .pth files, but it does slow down the process of package discovery for things that do use pkg_resources to locate their dependencies. It also still requires them to repackage existing packages, but doesn't require changing the layout. Also, such packages will currently cause easy_install to warn about conflicting packages if you try to install a different version of the same package, but this will be alleviated soon, as I'm working on a better conflict management mechanism that will allow egg directories on PYTHONPATH to override things in the standard directories. (Currently, eggs are only ever added to the end of sys.path, so if the local packaging system puts .egg-info directories in site-packages, there would be no way to locally override that for an individual user's packages. A future version of setuptools will resolve that issue soon, hopefully in the next few weeks.) As for eggs being the "solution to everything", I would like to point out that what precisely constitutes an egg is an extensible concept. See e.g.: http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html which shows that there are actually three formats that are "eggs" at the moment: 1. .egg zipfiles 2. .egg directories 3. .egg-info marker directories The key requirements for a format to be a pluggable distribution or "egg" are: * Adding it to sys.path must make it importable * It must be possible to discover its PyPI project name (and preferably version and platform) from the filename * It must allow arbitrary data files and directories to be included within packages, and allow arbitrary metadata files and directories to be included for the project as a whole * It must include the standard PKG-INFO metadata These are the absolute minimums, but there are additional specific metadata files and directories that easy_install requires in order to detect possible conflicts, create scripts, etc. Anyway, the point is that what constitutes an "egg" is flexible, but the "add to sys.path and make it importable" requirement certainly limits what formats are practically meaningful. Nonetheless, further extensibility is certainly possible if there's need.
Please make sure that your eggs catch all possible Python binary build dimensions:
* Python version * Python Unicode variant (UCS2, UCS4) * OS name * OS version * Platform architecture (e.g. 32-bit vs. 64-bit)
As far as I know, all of this except the Unicode variant is captured in distutils' get_platform(). And if it's not, it should be, since it affects any other kind of bdist mechanism.
and please also make this scheme extendable, so that it is easy to add more dimensions should they become necessary in the future.
It's extensible by changing the get_platform() and compatible_platform() functions in pkg_resources. By the way, I've issued requests on this list at least twice over the past year for people to provide input about how the platform strings should work; I got no response to either call, so I gave up. Later, when an OS X upgrade created a compatibility problem, somebody finally chipped in with info about what good OS X platform strings might be. I suspect that basically we'll get good platform strings once there are enough people encountering problems with the current ones to suggest a better scheme. :( If you have suggestions, please make them known, and let's get them into the distutils in general, not just our own offshoots thereof.
To make things easier for the user, the install system should be capable of detecting all these dimensions and use appropriate defaults when looking for an egg.
That's done for those dimensions currently handled by get_platform(), and can be changed by changes to get_platform() and compatible_platforms() in pkg_resources.
Please reconsider your use of .pth files - these cause the Python interpreter startup time to increase significantly. If you just have one of those files pointing to your managed installation path used for eggs, that should be fine (although adding that path to PYTHONPATH still beats having a .pth to parse everytime the interpreter fires up).
EasyInstall uses at most one .pth file, to allow packages to be on the path at runtime without needing an explicit 'require()'. However, a vendor creating packages probably doesn't want to have to edit that .pth file, so a trivial alternative is to install a .pth for each package. The tradeoff is startup time versus packager convenience in that case. Having a tool to edit a single .pth file would be good, but not all packaging systems have the ability to run a program at install or uninstall time. If they do, then editing easy-install.pth to add or remove eggs is a better option. Eggs can of course be installed in multi-version mode, in which case no .pth is necessary, but then an explicit require() or a dependency declaration in a setup script is necessary in order to use the package.
If you however install a .pth file for every egg, you'll soon end up with an unreasonable startup time which slows down your whole Python installation - including applications that don't use setuptools or any of the eggs.
A single .pth file is certainly an option, and it's what easy_install itself uses.
Alle 18:37, mercoledì 05 ottobre 2005, Phillip J. Eby ha scritto:
The new setuptools is all nice and easy for end user, but as a package maintainer, I'd like to have the option of building a binary package without all the dependencies.
In the long run, this should be done by packaging the result of bdist_egg, and by default doing bdist_rpm will do this now. In the short term, unless you're switching to an all-egg distribution, you'll probably want to use legacy/unmanaged mode.
I think you are missing his point here:
As package maintainer you *have* to be able to build a distribution package without all the dependency checks being applied - how else would you be able to bootstrap the package in case you have circular dependencies ?
In legacy/unmanaged mode, setuptools' "install" command behaves the way the standard distutils "install" does today, without creating an egg or searching for dependencies.
I think that eggs are a good addition to python distutils as long as they can be created by packagers too. If legacy/unmanaged mode is mandatory to build a package we will never see Fedora/Debian/Mandriva and the other big distros shipping eggs. The ability to install a package without installing its dependencies is crucial to let setuptools beeing widely used IMHO. As I previosly wrote here, another requisite would be very good for packagers and setuptools users too: extra egg file installation from setup.py. If setuptools setup.py need to install an .h file or some /etc/config or some /usr/share/heavy.tar.gz by hand... everything more complex than a simple python module will need its distributor to reinvent the weel every time! I think setuptools is a really good package as is, but I always try to add my 0.0001€ when I can. Regards Vincenzo ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it
Phillip J. Eby wrote:
At 10:27 AM 10/5/2005 +0200, M.-A. Lemburg wrote:
[Some comments on your strategy...]
Phillip J. Eby wrote:
The new setuptools is all nice and easy for end user, but as a package maintainer, I'd like to have the option of building a binary package
without
all the dependencies.
In the long run, this should be done by packaging the result of bdist_egg, and by default doing bdist_rpm will do this now. In the short term,
unless
you're switching to an all-egg distribution, you'll probably want to use legacy/unmanaged mode.
I think you are missing his point here:
As package maintainer you *have* to be able to build a distribution package without all the dependency checks being applied - how else would you be able to bootstrap the package in case you have circular dependencies ?
In legacy/unmanaged mode, setuptools' "install" command behaves the way the standard distutils "install" does today, without creating an egg or searching for dependencies.
Sorry, maybe I wasn't clear: a package builder needs to *build* a package (rpm, egg, .tar.gz drop in place archive, etc.) without the dependency checks. For the user to be able to turn off the dependency checks when installing an egg using an option is also an often needed feature. rpm often requires this when you want to install packages in different order, in automated installs or due to conflicts in the way different packages name the dependencies. I guess, eggs will exhibit the same problems over time.
I don't think that eggs are the solution to everything, so you should at least extend the dependency checking code to have it detect already installed packages (by trying import and looking at __version__ strings) or having an option to tell the system: "this dependency is satisfied, trust me".
There are plans to have a feature like that, and in fact setuptools already has code to hunt down __version__ strings and the like, without even needing to import the packages. It isn't integrated with the rest of the system yet, though.
One reason for that is that early feedback suggests that package developers and users would rather have the assurance of having the exact version required by something, as long as the installation process doesn't impose any additional burden on them. Local detection hacks have been primarily requested by packagers, who (quite reasonably) do not want to have to repackage everything as eggs.
There is a simple trick that packagers can use to make their legacy packages work as eggs: build .egg-info directories for them in the sys.path directory where the package resides, so that the necessary metadata is present. This does not require the use of .pth files, but it does slow down the process of package discovery for things that do use pkg_resources to locate their dependencies. It also still requires them to repackage existing packages, but doesn't require changing the layout.
Where would you have to put these directories and what do they contain ?
Also, such packages will currently cause easy_install to warn about conflicting packages if you try to install a different version of the same package, but this will be alleviated soon, as I'm working on a better conflict management mechanism that will allow egg directories on PYTHONPATH to override things in the standard directories. (Currently, eggs are only ever added to the end of sys.path, so if the local packaging system puts .egg-info directories in site-packages, there would be no way to locally override that for an individual user's packages. A future version of setuptools will resolve that issue soon, hopefully in the next few weeks.)
I must admit that I haven't followed the discussions about these .egg-info directories. Is there a good reason not to use the already existing PKG-INFO files that distutils builds and which are used by PyPI (aka cheeseshop) ?
As for eggs being the "solution to everything", I would like to point out that what precisely constitutes an egg is an extensible concept. See e.g.:
http://mail.python.org/pipermail/distutils-sig/2005-June/004652.html
which shows that there are actually three formats that are "eggs" at the moment:
1. .egg zipfiles 2. .egg directories 3. .egg-info marker directories
The key requirements for a format to be a pluggable distribution or "egg" are:
* Adding it to sys.path must make it importable * It must be possible to discover its PyPI project name (and preferably version and platform) from the filename * It must allow arbitrary data files and directories to be included within packages, and allow arbitrary metadata files and directories to be included for the project as a whole * It must include the standard PKG-INFO metadata
These are the absolute minimums, but there are additional specific metadata files and directories that easy_install requires in order to detect possible conflicts, create scripts, etc.
Anyway, the point is that what constitutes an "egg" is flexible, but the "add to sys.path and make it importable" requirement certainly limits what formats are practically meaningful. Nonetheless, further extensibility is certainly possible if there's need.
Hmm, you seem to be making things unnecessarily complicated. Why not just rely on the import mechanism and put all eggs into a common package, e.g. pythoneggs ?! Your EasyInstall script could then modify a file in that package called e.g. database.py which includes all the necessary information about all the installed packages in form of a dictionary. This would have the great advantage of allowing introspection without too much fuzz and reduces the need to search paths, directories and so-on which causes a lot of I/O overhead and slows down startup times for applications needing to check dependency requirements a lot.
Please make sure that your eggs catch all possible Python binary build dimensions:
* Python version * Python Unicode variant (UCS2, UCS4) * OS name * OS version * Platform architecture (e.g. 32-bit vs. 64-bit)
As far as I know, all of this except the Unicode variant is captured in distutils' get_platform(). And if it's not, it should be, since it affects any other kind of bdist mechanism.
Agreed. So you use get_platform() for the egg names ?
and please also make this scheme extendable, so that it is easy to add more dimensions should they become necessary in the future.
It's extensible by changing the get_platform() and compatible_platform() functions in pkg_resources.
Ah, that's monkey patching. Isn't there some better way ?
By the way, I've issued requests on this list at least twice over the past year for people to provide input about how the platform strings should work; I got no response to either call, so I gave up. Later, when an OS X upgrade created a compatibility problem, somebody finally chipped in with info about what good OS X platform strings might be. I suspect that basically we'll get good platform strings once there are enough people encountering problems with the current ones to suggest a better scheme. :(
If you have suggestions, please make them known, and let's get them into the distutils in general, not just our own offshoots thereof.
This is what we use: def py_version(unicode_aware=1, include_patchlevel=0): """ Return the Python version as short string. If unicode_aware is true (default), the function also tests whether a UCS2 or UCS4 built is running and modifies the version accordingly. If include_patchlevel is true (default is false), the patch level is also included in the version string. """ if include_patchlevel: version = sys.version[:5] else: version = sys.version[:3] if unicode_aware and version > '2.0': # UCS4 builds were introduced in Python 2.1; Note: RPM doesn't # like hyphens to be used in the Python version string which is # why we append the UCS information using an underscore. try: unichr(100000) except ValueError: # UCS2 build (standard) version = version + '_ucs2' else: # UCS4 build (most recent Linux distros) version = version + '_ucs4' return version and then patch the various commands in distutils, e.g.: class mx_build(build): """ build command which knows about our distutils extensions. This build command builds extensions in properly separated directories (which includes building different Unicode variants in different directories). """ ... def finalize_options(self): # Make sure different Python versions are built in separate # directories python_platform = '.%s-%s' % (get_platform(), py_version()) if self.build_platlib is None: self.build_platlib = os.path.join(self.build_base, 'lib' + python_platform) if self.build_temp is None: self.build_temp = os.path.join(self.build_base, 'temp' + python_platform) # Call the base method build.finalize_options(self) class mx_bdist(bdist): """ Generic binary distribution command. """ def finalize_options(self): # Default to <platform>-<pyversion> on all platforms if self.plat_name is None: self.plat_name = '%s-py%s' % (get_platform(), py_version()) bdist.finalize_options(self) The result is a build system that can be used to build all binaries for a single platform without getting conflicts and binaries that include a proper platform string, e.g. egenix-mxodbc-zopeda-1.0.9.darwin-8.2.0-Power_Macintosh-py2.3_ucs2.zip egenix-mxodbc-zopeda-1.0.9.linux-i686-py2.3_ucs2.zip egenix-mxodbc-zopeda-1.0.9.linux-i686-py2.3_ucs4.zip
To make things easier for the user, the install system should be capable of detecting all these dimensions and use appropriate defaults when looking for an egg.
That's done for those dimensions currently handled by get_platform(), and can be changed by changes to get_platform() and compatible_platforms() in pkg_resources.
Please reconsider your use of .pth files - these cause the Python interpreter startup time to increase significantly. If you just have one of those files pointing to your managed installation path used for eggs, that should be fine (although adding that path to PYTHONPATH still beats having a .pth to parse everytime the interpreter fires up).
EasyInstall uses at most one .pth file, to allow packages to be on the path at runtime without needing an explicit 'require()'. However, a vendor creating packages probably doesn't want to have to edit that .pth file, so a trivial alternative is to install a .pth for each package. The tradeoff is startup time versus packager convenience in that case. Having a tool to edit a single .pth file would be good, but not all packaging systems have the ability to run a program at install or uninstall time. If they do, then editing easy-install.pth to add or remove eggs is a better option.
Eggs can of course be installed in multi-version mode, in which case no .pth is necessary, but then an explicit require() or a dependency declaration in a setup script is necessary in order to use the package.
If you however install a .pth file for every egg, you'll soon end up with an unreasonable startup time which slows down your whole Python installation - including applications that don't use setuptools or any of the eggs.
A single .pth file is certainly an option, and it's what easy_install itself uses.
Fair enough. Could this be enforced and maybe also removed completely by telling people to add the egg directory to PYTHONPATH ? Note that the pythonegg package approach would pretty much remove the need for these .pth files. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 07 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
At 02:01 PM 10/7/2005 +0200, M.-A. Lemburg wrote:
Sorry, maybe I wasn't clear: a package builder needs to *build* a package (rpm, egg, .tar.gz drop in place archive, etc.) without the dependency checks.
bdist_egg simply builds an egg. Dependency checking is a function of *installing* the egg, not building it.
For the user to be able to turn off the dependency checks when installing an egg using an option is also an often needed feature.
Yes, and it has been on my to-do list for some time. However, the majority of packages in eggs today don't have any dependencies declared anyway, because they're not packages that use setuptools. So the option, if it existed, wouldn't have been very useful until quite recently. In any case, the main refactoring I needed to do before that option could be added is done, so I'll probably add it in the next non-bugfix release.
rpm often requires this when you want to install packages in different order, in automated installs or due to conflicts in the way different packages name the dependencies. I guess, eggs will exhibit the same problems over time.
I'm not sure I follow you here, but in any case there's nothing stopping people from installing eggs by just dropping them in a directory on sys.path without doing any installation steps at all. It's only if you want the egg to be on sys.path at startup without manually munging PYTHONPATH or a .pth file or calling require(), or if you want to install any scripts that you need to run easy_install on the egg.
There is a simple trick that packagers can use to make their legacy packages work as eggs: build .egg-info directories for them in the sys.path directory where the package resides, so that the necessary metadata is present. This does not require the use of .pth files, but it does slow down the process of package discovery for things that do use pkg_resources to locate their dependencies. It also still requires them to repackage existing packages, but doesn't require changing the layout.
Where would you have to put these directories and what do they contain ?
You put them in the directory where the unmanaged packages are installed. At minimum, they contain a PKG-INFO file, and if the package ordinarily uses setuptools, they should also contain whatever else the egg's EGG-INFO directory contained. The directory name is ProjectName.egg-info, where ProjectName is the project's name on PyPI, with non-alphanumerics condensed by the pkg_resources.safe_name() function.
I must admit that I haven't followed the discussions about these .egg-info directories. Is there a good reason not to use the already existing PKG-INFO files that distutils builds and which are used by PyPI (aka cheeseshop) ?
I don't know if there's such a reason or not, but in any case that's what we use as part of the egg-info directories. However, we *also* allow for unlimited metadata resources to be provided in egg-info, as this is what allows us to carry things like plugin metadata and scripts in the egg. There are other metadata files listing the C extensions in the package, the "namespace packages" that the egg participates in, and so on.
Hmm, you seem to be making things unnecessarily complicated.
That probably just means you're not familiar with the requirements. My first post here about the issues was about this time last year, discussing application plugins and their packaging. The use of eggs for general Python libraries as well as plugins only came into play this January, at Bob Ippolito's urging. So, while there may potentially exist solutions that might be somewhat simpler for certain kinds of Python library packaging, they don't even begin to address the issues for application plugin packaging, which is the raison d'etre of eggs. Trac, for example, lets you simply drop eggs into a plugin directory in order to use them. At some point, Chandler should be allowing this as well, and maybe someday Zope will support it too. It's primarily for these use cases that eggs exist; it just so happens that they make a fine way to manage installed Python packages as well.
Why not just rely on the import mechanism and put all eggs into a common package, e.g. pythoneggs ?! Your EasyInstall script could then modify a file in that package called e.g. database.py which includes all the necessary information about all the installed packages in form of a dictionary.
You completely lost me. A major feature of eggs is that for an application needing plugins, it can simply scan a directory of downloaded eggs and plug them into itself. Having a required installation mechanism other than "download the egg and put it here" breaks that. What's more, putting them in a single package makes it impossible to have eggs installed in more than one directory, since packages can't span directories, at least not without using setuptools' namespace package facility. And using that facility would mean the runtime would have to always get imported whenever you used an egg - which is *not* required right now unless you're using a zipped egg with a C extension in it. And even then the runtime only gets imported if you actually try to import the C extension. So, it seems to me your approach creates more I/O overhead for using installed packages. Finally, don't forget that eggs allow simultaneous installation of multiple versions of a package. So, you'd *still* have to have sys.path manipulation.
This would have the great advantage of allowing introspection without too much fuzz and reduces the need to search paths, directories and so-on which causes a lot of I/O overhead and slows down startup times for applications needing to check dependency requirements a lot.
And the disadvantage of absolutely requiring install/uninstall steps, which is anathema. Note that with the exception of .egg-info markers (which aren't really intended for production use, anyway, they're a feature for deploying packages under development without needing to build a "real" egg), eggs can be fully introspected from their *filename* for dependency processing purposes. So, if the needed eggs are all on sys.path already, no additional I/O gets done. Identifying all the eggs available in a given directory is one listdir() operation, but it only happens if a suitable package isn't already on sys.path, and the listdir()s happen at most once during a given instance of dependency processing.
Please make sure that your eggs catch all possible Python binary build dimensions:
* Python version * Python Unicode variant (UCS2, UCS4) * OS name * OS version * Platform architecture (e.g. 32-bit vs. 64-bit)
As far as I know, all of this except the Unicode variant is captured in distutils' get_platform(). And if it's not, it should be, since it affects any other kind of bdist mechanism.
Agreed.
So you use get_platform() for the egg names ?
Yes - except on Mac OS X, which has a changed platform string.
and please also make this scheme extendable, so that it is easy to add more dimensions should they become necessary in the future.
It's extensible by changing the get_platform() and compatible_platform() functions in pkg_resources.
Ah, that's monkey patching. Isn't there some better way ?
Well, my presumption here is that we're going to get the scheme right for Python at large, and make it standard. Are you saying that some packages should have their own scheme? That's not really workable since in order to import the package and use its scheme, we would have to first know that the package was compatible!
If you have suggestions, please make them known, and let's get them into the distutils in general, not just our own offshoots thereof.
This is what we use:
def py_version(unicode_aware=1, include_patchlevel=0):
[snip] The result is a build system that can be used to build all binaries for a single platform without getting conflicts and binaries that include a proper platform string, e.g.
egenix-mxodbc-zopeda-1.0.9.darwin-8.2.0-Power_Macintosh-py2.3_ucs2.zip egenix-mxodbc-zopeda-1.0.9.linux-i686-py2.3_ucs2.zip egenix-mxodbc-zopeda-1.0.9.linux-i686-py2.3_ucs4.zip
eggs put the Python version before the platform, because "pure" eggs that don't contain any C code don't include the platform string. We also don't have a UCS flag, but if we did it should be part of the platform string rather than the Python version, since "pure" eggs don't care about the UCS mode, and even if they did, that'd be a requirement of the package rather than the egg itself being platform specific.
A single .pth file is certainly an option, and it's what easy_install itself uses.
Fair enough.
Could this be enforced and maybe also removed completely by telling people to add the egg directory to PYTHONPATH ?
If by "egg directory" you mean a single .egg directory (or zipfile) for a particular package, then yes, for that particular package you could do that. If you mean, can you just put the directory *containing* eggs on PYTHONPATH, then the answer is no, if you want the package to be on sys.path without any special action taken (like calling pkg_resources.require()).
Note that the pythonegg package approach would pretty much remove the need for these .pth files.
Only in the sense that it would require reinventing them in a different form. :)
Phillip J. Eby wrote:
I must admit that I haven't followed the discussions about these .egg-info directories. Is there a good reason not to use the already existing PKG-INFO files that distutils builds and which are used by PyPI (aka cheeseshop) ?
I don't know if there's such a reason or not, but in any case that's what we use as part of the egg-info directories. However, we *also* allow for unlimited metadata resources to be provided in egg-info, as this is what allows us to carry things like plugin metadata and scripts in the egg. There are other metadata files listing the C extensions in the package, the "namespace packages" that the egg participates in, and so on.
Hmm, you seem to be making things unnecessarily complicated.
That probably just means you're not familiar with the requirements.
I did read your posting, but still don't understand why you need a multitude of meta-data files in a special directory. PKG-INFO is general and extensible enough to hold all that information, IMHO.
Why not just rely on the import mechanism and put all eggs into a common package, e.g. pythoneggs ?! Your EasyInstall script could then modify a file in that package called e.g. database.py which includes all the necessary information about all the installed packages in form of a dictionary.
You completely lost me. A major feature of eggs is that for an application needing plugins, it can simply scan a directory of downloaded eggs and plug them into itself. Having a required installation mechanism other than "download the egg and put it here" breaks that.
While I don't find a non-managed Python installation mechanism a particularly useful goal to have, you could still have the same thing by using and scanning a sub-directory of the pythoneggs package directory or directories listed in an environment variable PYTHONEGGS as fallback solution (if the egg was not found in the database.py module).
What's more, putting them in a single package makes it impossible to have eggs installed in more than one directory, since packages can't span directories, at least not without using setuptools' namespace package facility. And using that facility would mean the runtime would have to always get imported whenever you used an egg - which is *not* required right now unless you're using a zipped egg with a C extension in it. And even then the runtime only gets imported if you actually try to import the C extension. So, it seems to me your approach creates more I/O overhead for using installed packages.
If your application wants to support drop-in eggs for plugins, I don't see the need to call some startup code in that application as a problem. If the application does not need drop-in eggs, the package approach would be more effective.
Finally, don't forget that eggs allow simultaneous installation of multiple versions of a package. So, you'd *still* have to have sys.path manipulation.
Nope - the .__path__ attribute of Python packages makes this easy: http://www.python.org/doc/essays/packages.html
This would have the great advantage of allowing introspection without too much fuzz and reduces the need to search paths, directories and so-on which causes a lot of I/O overhead and slows down startup times for applications needing to check dependency requirements a lot.
And the disadvantage of absolutely requiring install/uninstall steps, which is anathema.
Oops. I disagree on that one. Not only does install/uninstall make system administration a whole lot easier, it also prevents accidental misconfigurations, permission problems, problems with finding the right paths and locations, etc. etc. Also note that the pythoneggs package approach would still allow you to use unmanaged eggs - albeit as fallback solution or specifically for plugins.
Please make sure that your eggs catch all possible Python binary build dimensions:
* Python version * Python Unicode variant (UCS2, UCS4) * OS name * OS version * Platform architecture (e.g. 32-bit vs. 64-bit)
Well, my presumption here is that we're going to get the scheme right for Python at large, and make it standard. Are you saying that some packages should have their own scheme? That's not really workable since in order to import the package and use its scheme, we would have to first know that the package was compatible!
We're talking about filenames here - they are intended to be read and understood by humans, not machines (these can use the PKG-INFO data inside the archives or from PyPI). That said, yes, the way platforms are setup, it does sometimes make it necessary to add extra information to such a filename. E.g. say you write a plugin for Zope that only works in Zope3 and not Zope2. Such a plugin would use the "zope3" distinguisher in its archive name.
If you have suggestions, please make them known, and let's get them into the distutils in general, not just our own offshoots thereof.
This is what we use:
def py_version(unicode_aware=1, include_patchlevel=0):
[snip] The result is a build system that can be used to build all binaries for a single platform without getting conflicts and binaries that include a proper platform string, e.g.
egenix-mxodbc-zopeda-1.0.9.darwin-8.2.0-Power_Macintosh-py2.3_ucs2.zip egenix-mxodbc-zopeda-1.0.9.linux-i686-py2.3_ucs2.zip egenix-mxodbc-zopeda-1.0.9.linux-i686-py2.3_ucs4.zip
eggs put the Python version before the platform, because "pure" eggs that don't contain any C code don't include the platform string.
They should really use "noarch" like everybody else does :-)
We also don't have a UCS flag, but if we did it should be part of the platform string rather than the Python version, since "pure" eggs don't care about the UCS mode, and even if they did, that'd be a requirement of the package rather than the egg itself being platform specific.
This is not correct: unichr(100000) won't work in UCS2 builds - it will in UCS4 builds, so even though the .pyc files run on both builds unchanged, the application may very well require the used Python version to be a UCS4 build in order to be able to use UCS4 features.
A single .pth file is certainly an option, and it's what easy_install itself uses.
Fair enough.
Could this be enforced and maybe also removed completely by telling people to add the egg directory to PYTHONPATH ?
If by "egg directory" you mean a single .egg directory (or zipfile) for a particular package, then yes, for that particular package you could do that. If you mean, can you just put the directory *containing* eggs on PYTHONPATH, then the answer is no, if you want the package to be on sys.path without any special action taken (like calling pkg_resources.require()).
Calling such an API is OK for applications supporting eggs. I don't see that as a problem.
Note that the pythonegg package approach would pretty much remove the need for these .pth files.
Only in the sense that it would require reinventing them in a different form. :)
Not really - but we seem to have different views on whether installers are good thing or not, so there's little point in argueing over this. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 11 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
On 10/11/05, M.-A. Lemburg
Only in the sense that it would require reinventing them in a different form. :)
Not really - but we seem to have different views on whether installers are good thing or not, so there's little point in argueing over this.
I have to say that although I like the egg concept in general, I too am uncomfortable with the "just drop the file in" approach to package management. I'm completely fine with it for plugins, but as you say, I'd expect applications that support plugins to set things up to make drop-in plugins work - standardising that mechanism's fine. But for general packages (PIL, pywin32, cx_Oracle, whatever), I actively want a managed installation. I have made noises about building management tools for eggs, but I think I've been hitting the "absolutely requiring install/uninstall steps, which is anathema" philosophy, which makes such tools awkward to define. So I'd like to see this discussion continue, because I suspect MAL has some important points - which may clash with PJE's goals somewhat, but at least if we get the clashes out in the open, it may help people like me, who can't quite understand why we're having difficulty understanding the concepts, to see where the mental blocks lie :-) Paul.
At 04:46 PM 10/11/2005 +0200, M.-A. Lemburg wrote:
Phillip J. Eby wrote:
I must admit that I haven't followed the discussions about these .egg-info directories. Is there a good reason not to use the already existing PKG-INFO files that distutils builds and which are used by PyPI (aka cheeseshop) ?
I don't know if there's such a reason or not, but in any case that's what we use as part of the egg-info directories. However, we *also* allow for unlimited metadata resources to be provided in egg-info, as this is what allows us to carry things like plugin metadata and scripts in the egg. There are other metadata files listing the C extensions in the package, the "namespace packages" that the egg participates in, and so on.
Hmm, you seem to be making things unnecessarily complicated.
That probably just means you're not familiar with the requirements.
I did read your posting, but still don't understand why you need a multitude of meta-data files in a special directory.
PKG-INFO is general and extensible enough to hold all that information, IMHO.
And I suppose you have a plan for embedding site.zcml files in PKG-INFO too? How about Trac plugin specifiers? Paste template definitions? Eggs are a format to suppport applications and their plugins. They support arbitrary Python projects as well, because that makes it easy for applications and their plugins to depend on them. They are not merely a distribution format for installing systemwide Python packages. We have plenty of those formats already.
You completely lost me. A major feature of eggs is that for an application needing plugins, it can simply scan a directory of downloaded eggs and plug them into itself. Having a required installation mechanism other than "download the egg and put it here" breaks that.
While I don't find a non-managed Python installation mechanism a particularly useful goal to have,
It's incredibly useful for application distributors, especially extensible applications and app servers like Zope, Trac, Chandler, etc. They simply cannot afford to rely on the system Python or native packaging system to meet their requirements or provide a quality user experience.
you could still have the same thing by using and scanning a sub-directory of the pythoneggs package directory or directories listed in an environment variable PYTHONEGGS as fallback solution (if the egg was not found in the database.py module).
This approach doesn't allow any eggs to be on sys.path by default, nor does it allow simply importing and using target packages. The current system allows us to create eggs for packages that know nothing about eggs, without making any changes to their code. (We even automatically detect potential __file__ manipulation code, and mark such eggs as needing to be installed in unzipped form.)
And the disadvantage of absolutely requiring install/uninstall steps, which is anathema.
Oops. I disagree on that one. Not only does install/uninstall make system administration a whole lot easier,
Eggs are not a system administration tool. They're for people making software and people using it. If a vendor wants to package eggs for the convenience of their users, great. If not, that's okay, because eggs are specifically intended to not require system administrator support. System administrator involvement in this process is a *bug*, not a feature. Users having to beg the sysadmin to get Python packages installed is a Bad Thing. Applications having to rely on what's in site-packages is a Bad Thing. Eggs allow users and applications to manage their own needs, independent of what the site or vendor does or doesn't provide.
Please make sure that your eggs catch all possible Python binary build dimensions:
* Python version * Python Unicode variant (UCS2, UCS4) * OS name * OS version * Platform architecture (e.g. 32-bit vs. 64-bit)
Well, my presumption here is that we're going to get the scheme right for Python at large, and make it standard. Are you saying that some packages should have their own scheme? That's not really workable since in order to import the package and use its scheme, we would have to first know that the package was compatible!
We're talking about filenames here - they are intended to be read and understood by humans, not machines (these can use the PKG-INFO data inside the archives or from PyPI).
If you read the specification, you'll see that this is not the case. Eggs require machine-parseable filenames, as this allows them to be rapidly discovered at runtime for dynamic dependency resolution, with a simple listdir(). Unlike your database.py concept or PEP 262, it is impossible for the "index" to become out-of-sync with the actual state of the installation, because it *is* the current state of the installation. That said, let's do what we can to get the distutils platform strings to be more useful indicators of whether the contained native code can be linked and run by a given Python installation.
That said, yes, the way platforms are setup, it does sometimes make it necessary to add extra information to such a filename.
E.g. say you write a plugin for Zope that only works in Zope3 and not Zope2. Such a plugin would use the "zope3" distinguisher in its archive name.
The purpose of including platform information in an egg's filename is to avoid attempting to link or run "foreign" native code that might cause a hard crash of the Python process. A Zope 2 vs. 3 distinction would not be required as an external designation, since the version dependencies declared by the package will either be resolvable or not.
We also don't have a UCS flag, but if we did it should be part of the platform string rather than the Python version, since "pure" eggs don't care about the UCS mode, and even if they did, that'd be a requirement of the package rather than the egg itself being platform specific.
This is not correct: unichr(100000) won't work in UCS2 builds - it will in UCS4 builds, so even though the .pyc files run on both builds unchanged, the application may very well require the used Python version to be a UCS4 build in order to be able to use UCS4 features.
As I said, that would be a requirement of the package, rather than the egg itself being platform-specific. Again, the platform string is just a filter to avoid trying to import things that could crash the interpreter (as opposed to merely raising an exception).
A single .pth file is certainly an option, and it's what easy_install itself uses.
Fair enough.
Could this be enforced and maybe also removed completely by telling people to add the egg directory to PYTHONPATH ?
If by "egg directory" you mean a single .egg directory (or zipfile) for a particular package, then yes, for that particular package you could do that. If you mean, can you just put the directory *containing* eggs on PYTHONPATH, then the answer is no, if you want the package to be on sys.path without any special action taken (like calling pkg_resources.require()).
Calling such an API is OK for applications supporting eggs. I don't see that as a problem.
"applications supporting eggs" is not the same thing as "people using eggs". People using eggs would like, in the general case, to be able to just fire up the Python interpreter and use the packages they've installed, without any special steps. This is especially important for users who are simply using the easy_install toolchain to install arbitrary distutils-based packages.
Note that the pythonegg package approach would pretty much remove the need for these .pth files.
Only in the sense that it would require reinventing them in a different form. :)
Not really - but we seem to have different views on whether installers are good thing or not, so there's little point in argueing over this.
We disagree on whether *requiring* an install step is a good thing. Good installer support is important, which is why EasyInstall can search PyPI, and supports download/extract/build/install for most distutils-based packages, and handles dependency resolution for setuptools-based packages. Being able to provide good installer support is actually an important feature of eggs! *Requiring* installation, however, is a no-no. It should be possible to ship a Python application by just dumping the application script and a bunch of eggs in a single directory. Having to then "install" those eggs somewhere on the target system is a nuisance. It's also nice that there is no way to "corrupt" your index of eggs except by tampering with the eggs themselves. It's hard to mess it up in some unrecoverable way, and everything is simple enough to inspect by hand with common tools like 'ls' and 'less' and 'unzip -v'.
participants (5)
-
Brian Zhou
-
M.-A. Lemburg
-
Paul Moore
-
Phillip J. Eby
-
Vincenzo Di Massa